(navigation image)
Home American Libraries | Canadian Libraries | Universal Library | Community Texts | Project Gutenberg | Children's Library | Biodiversity Heritage Library | Additional Collections
Search: Advanced Search
Anonymous User (login or join us)
Upload
See other formats

Full text of "Linear Algebra and Matrix Theory"

NERING 



5 



so 



rri ^ 

o> 

33 Z 



m 
O 
O 

Z 

o 



512 
943 



NER 





WILEY 



I 



LINEAR ALGEBRA AND 

MATRIX THEORY 

By EVAR D.NERINGi^S 



Nering 



z 

m 
> 

73 



o 

m 
09 

73 

> 
> 

z 

73 
X 

H 

m 

o 

■< 



second edition 

LINEAR ALGEBRA 

AND 
MATRIX THEORY 

Evar D. Nering 







second 
edition 

Wiley 





Ct Ht- 



ABOUT THE AUTHOR 

EVAR D. NERING is Professor of 
Mathematics and Chairman of the 
Department at Arizona State Uni- 
versity, where he has taught 
since 1960. Prior to that, he was 
Assistant Professor of Mathe- 
matics at the University of Minne- 
sota (1948-1956), and Associate 
Professor at the University of Ari- 
zona (1956-1960). He received his 
A.B. and A.M., both in Mathe- 
matics, from Indiana University. 
In 1947, he earned an A.M. from 
Princeton University, and re- 
ceived his Ph.D. from the same 
university in 1948. He worked as 
a mathematician with Goodyear 
Aircraft Corporation from 1953 to 
1954. 

During the summers of 1958, 1960, 
and 1962, Dr. Nering was a visit- 
ing lecturer at the University of 
Colorado. During the summer of 
1961, he served as a Research 
Mathematician at the Mathe- 
matics Research Center of the 
University of Wisconsin. 







\ Is 



3S3M-0 



PRESTON POLYTECHNIC 
LIBRARY & LEARNING RESOURCES SERVICE 

This book must be returned on or before the date last stamped 




:>i 




r.h 












Y^jji-nP'W'M 



512.943 NER 

A/C 033340 



Pr 30107 



000 534 047 







Linear Algebra and 
Matrix Theory 



second edition 



Evar D* Nering 

Professor of Mathematics 
Arizona State University 



® 



John Wiley & Sons, Inc. 

New York London Sydney Toronto 



HARRIS COLLEGE 
PRESTON 



M7..i^X 



N&£~ 



Copyright © 1963, 1970 by John Wiley & Sons, Inc. 

AIJ rights reserved. No part of this book may be 
reproduced by any means, nor transmitted, nor trans- 
lated into a machine language without the written 
permission of the publisher. 



Library of Congress Catalog Card Number: 76-9(646 

SBN 471 63178 7 y^ 

Printed in the United States of America 

10 987654321 



Preface to 
first edition 



The underlying spirit of this treatment of the theory of matrices is that of a 
concept and its representation. For example, the abstract concept of an 
integer is the same for all cultures and presumably should be the same to a 
being from another planet. But the various symbols we write down and 
carelessly refer to as "numbers" are really only representations of the 
abstract numbers. These representations should be called "numerals" and 
we should not confuse a numeral with the number it represents. Numerals 
of different types are the inventions of various cultures and individuals, and 
the superiority of one system of numerals over another lies in the ease with 
which they can be manipulated and the insight they give us into the nature 
of the numbers they represent. 

We happen to use numerals to represent things other than numbers. For 
example, we put numerals (not numbers) on the backs of football players 
to represent and identify them. This does not attribute to the football 
players any of the properties of the corresponding numbers, and the usual 
operations of arithmetic have no meaning in this context. No one would 
think of adding the halfback, 20, to the fullback, 40, to obtain the guard, 60. 
Matrices are used to represent various concepts with a wide variety of 
different properties. To cover these possibilities a number of different 
manipulations with matrices are introduced. In each situation the appro- 
priate manipulations that should be performed on a matrix or a set of 
matrices depend critically on the concepts represented. The student who 
learns the formalisms of matrix "arithmetic" without learning the under- 
lying concepts is in serious danger of performing operations which have no 
meaning for the problem at hand. 

The formal manipulations with matrices are relatively easy to learn, and 
few students have any difficulty performing the operations accurately, if 
somewhat slowly. If a course in matrix theory, however, places too much 
emphasis on these formal steps, the student acquires an ill-founded self- 
assurance. If he makes an error exactly analogous to adding halfbacks to 
fullbacks to obtain guards, he usually considers this to be a minor error 
(since each step was performed accurately) and does not appreciate how 
serious a blunder he has made. 
In even the simplest problems matrices can appear as representing several 



V1 Preface to First Edition 

different types of concepts. For example, it is typical to have a problem in 
which some matrices represent vectors, some represent linear transforma- 
tions, and others represent changes of bases. This alone should make it 
clear that an understanding of the things represented is essential to a meaning- 
ful manipulation of the representing symbols. In courses in vector analysis 
and differential geometry many students have difficulty with the concepts 
of co variant vectors and contra variant vectors. The troubles stem almost 
entirely from a confusion between the representing symbols and the things 
represented. As long as a student thinks of «-tuples (the representing 
symbols) as being identical with vectors (the things represented) he must 
think there are two different "kinds" of vectors all jumbled together and he 
sees no way to distinguish between them. There are, in fact, not two "kinds" 
of vectors. There are two entirely distinct vector spaces ; their representations 
happen to look alike, but they have to be manipulated differently. 

Although the major emphasis in this treatment of matrix theory is on 
concepts and proofs that some may consider abstract, full attention is given 
to practical computational procedures. In different problems different 
patterns of steps must be used, because the information desired is not the 
same in all. Fortunately, although the patterns change, there are only a 
few different types of steps. The computational techniques chosen here are 
not only simple and effective, but require a variety of steps that is particularly 
small. Because these same steps occur often, ample practice is provided 
and the student should be able to obtain more than adequate confidence 
in his skill. 

A single pattern of discussion recurs regularly throughout the book. 
First a concept is introduced. A coordinate system is chosen so that the 
concept can be represented by «-tuples or matrices. It is shown how the 
representation of the concept must be changed if the coordinate system is 
chosen in a different way. Finally, an effective computational procedure is 
described by which a particular coordinate system can be chosen so that the 
representing w-tuples or matrices are either simple or yield significant in- 
formation. In this way computational skill is supported by an understanding 
of the reasons for the various procedures and why they differ. Lack of this 
understanding is the most serious single source of difficulty for many students 
of matrix theory. 

The material contained in the first five chapters is intended for use in a 
one-semester, or one-quarter, course in the theory of matrices. The topics 
have been carefully selected and very few omissions should be made. The 
sections recommended for omission are designated with asterisks. The part 
of Section 4 of Chapter III following the proof of the Hamilton-Cayley 
Theorem can also be omitted without harm. No other omission in the first 
three chapters is recommended. For a one-quarter course the following 



Preface to First Edition vii 

additional omissions are recommended: Chapter IV, Sections 4 and 5; 
Chapter V, Sections 3, 4, 5, and 9. 

Chapter V contains two parallel developments leading to roughly analogous 
results. One, through Sections 1, 6, 7, 8, 10, 11, and 12, is rather formal 
and is based on the properties of matrices; the other, through Sections 
1, 3, 4, 5, 9, and 11, is more theoretical and is based on the properties of 
linear functional . This latter material is, in turn, based on the material 
in Sections 1-5 of Chapter IV. For this reason these omissions should not 
be made in a random manner. Omissions can be made to accommodate a 
one-quarter course, or for other purposes, by carrying one line of development 
to its completion and curtailing the other. 

The exercises are an important part of this book. The computational 
exercises in particular should be worked. They are intended to illustrate the 
theory, and their cumulative effect is to provide assurance and understanding. 
Numbers have been chosen so that arithmetic complications are minimized. 
A student who understands what he is doing should not find them lengthy 
or tedious. The theoretical problems are more difficult. Frequently they 
are arranged in sequence so that one exercise leads naturally into the next. 
Sometimes an exercise has been inserted mainly to provide a suggestive 
context for the next exercise. In other places large steps have been broken 
down into a sequence of smaller steps, and these steps arranged in a sequence 
of exercises. For this reason a student may find it easier to work ten exercises 
in a row than to work five of them by taking every other one. These exercises 
are numerous and the student should not let himself become bogged down 
by them. It is more important to keep on with the pace of development 
and to obtain a picture of the subject as a whole than it is to work every 
exercise. 

The last chapter on selected applications of linear algebra contains a lot 
of material in rather compact form. For a student who has been through 
the first five chapters it should not be difficult, but it is not easy reading for 
someone whose previous experience with linear algebra is in a substantially 
different form. Many with experience in the applications of mathematics 
will be unfamiliar with the emphasis on abstract and general methods. I 
have had enough experience in full-time industrial work and as a consultant 
to know that these abstract ideas are fully as practical as any concrete 
methods, and anyone who takes the trouble to familiarize himself with 
these ideas will find that this is true. In all engineering problems and most 
scientific problems it is necessary to deal with particular cases and with 
particular numbers. This is usually, however, only the last phase of the 
work. Initially, the problem must be dealt with in some generality until 
some rather important decisions can be made (and this is by far the more 
interesting and creative part of the work). At this stage methods which 



viii Preface to First Edition 

lead to understanding are to be preferred to methods which obscure under- 
standing in unnecessary formalisms. 

This book is frankly a textbook and not a treatise, and I have not attempted 
to give credits and references at each point. The material is a mosaic that 
draws on many sources. I must, however, acknowledge my tremendous 
debt to Professor Emil Artin, who was my principal source of mathematical 
education and inspiration from my junior year through my Ph.D. The 
viewpoint presented here, that matrices are mere representations of things 
more basic, is as close to his viewpoint as it is possible for an admiring 
student to come in adopting the ideas of his teacher. During my student 
days the book presenting this material in a form most in harmony with 
this viewpoint was Paul Halmos' elegant treatment in Finite Dimensional 
Vector Spaces, the first edition. I was deeply impressed by this book, and 
its influence on the organization of my text is evident. 



Evar D. Nering 



Tempe, Arizona 
January, 1963 



Preface to 
second edition 



This edition differs from the first primarily in the addition of new material. 
Although there are significant changes, essentially the first edition remains 
intact and embedded in this book. In effect, the material carried over from 
the first edition does not depend logically on the new material. Therefore, 
this first edition material can be used independently for a one-semester or 
one-quarter course. For such a one-semester course, the first edition usually 
required a number of omissions as indicated in its preface. This omitted 
material, together with the added material in the second edition, is suitable 
for the second semester of a two-semester course. 

The concept of duality receives considerably expanded treatment in this 
second edition. Because of the aesthetic beauty of duality, it has long been 
a favorite topic in abstract mathematics. I am convinced that a thorough 
understanding of this concept also should be a standard tool of the applied 
mathematician and others who wish to apply mathematics. Several sections 
of the chapter concerning applications indicate how duality can be used. 
For example, in Section 3 of Chapter V, the inner product can be used to 
avoid introducing the concept of duality. This procedure is often followed 
in elementary treatments of a variety of subjects because it permits doing 
some things with a minimum of mathematical preparation. However, the 
cost in loss of clarity is a heavy price to pay to avoid linear functionals. 
Using the inner product to represent linear functionals in the vector space 
overlays two different structures on the same space. This confuses concepts 
that are similar but essentially different. The lack of understanding which 
usually accompanies this shortcut makes facing a new context an unsure 
undertaking. I think that the use of the inner product to allow the cheap 
and early introduction of some manipulative techniques should be avoided. 
It is far better to face the necessity of introducing linear functionals at the 
earliest opportunity. 

I have made a number of changes aimed at clarification and greater 
precision. I am not an advocate of rigor for rigor's sake since it usually 
adds nothing to understanding and is almost always dull. However, rigor 
is not the same as precision, and algebra is a mathematical subject capable 
of both precision and beauty. However, tradition has allowed several 
situations to arise in which different words are used as synonyms, and all 



IX 



x Preface to Second Edition 

are applied indiscriminately to concepts that are not quite identical. For 
this reason, I have chosen to use "eigenvalue" and "characteristic value" to 
denote non-identical concepts; these terms are not synonyms in this text. 
Similarly, I have drawn a distinction between dual transformations and 
adjoint transformations. 

Many people were kind enough to give me constructive comments and 
observations resulting from their use of the first edition. All these comments 
were seriously considered, and many resulted in the changes made in this 
edition. In addition, Chandler Davis (Toronto), John H. Smith (Boston 
College), John V. Ryff (University of Washington), and Peter R. Christopher 
(Worcester Polytechnic) went through the first edition or a preliminary 
version of the second edition in detail. Their advice and observations were 
particularly valuable to me. To each and every one who helped me with 
this second edition, I want to express my debt and appreciation. 



Evar D. Nering 



Tempe, Arizona 
September, 1969 



Contents 



Introduction 

I Vector spaces 

1. Definitions 5 

2. Linear Independence and Linear Dependence 10 

3. Bases of Vector Spaces 15 

4. Subspaces 20 

II Linear transformations and matrices 

1. Linear Transformations 27 

2. Matrices 37 

3. Non-Singular Matrices 45 

4. Change of Basis 50 

5. Hermite Normal Form 53 

6. Elementary Operations and Elementary Matrices 57 

7. Linear Problems and Linear Equations 63 

8. Other Applications of the Hermite Normal Form 68 

9. Normal Forms 74 

*10. Quotient Sets, Quotient Spaces 78 
♦11. Hom(U, V) 83 

III Determinants, eigenvalues, and similarity transformations 85 

1. Permutations 86 

2. Determinants 89 

3. Cofactors 93 

4. The Hamilton-Cayley Theorem 98 

5. Eigenvalues and Eigenvectors 104 

6. Some Numerical Examples 110 

7. Similarity 113 

*8. The Jordan Normal Form 118 

IV Linear functionate, bilinear forms, quadratic forms 128 

1. Linear Functionals 129 

2. Duality 133 

xi 



xu Contents 

3. Change of Basis 134 

4. Annihilators 138 

5. The Dual of a Linear Transformation 142 
*6. Duality of Linear Transformations 145 
*7. Direct Sums 147 

8. Bilinear Forms 156 

9. Quadratic Forms 160 

10. The Normal Form 162 

11. Real Quadratic Forms 168 

12. Hermitian Forms 170 

V Orthogonal and unitary transformations, normal matrices 175 

1. Inner Products and Orthonormal Bases 176 
*2. Complete Orthonormal Sets 182 

3. The Representation of a Linear Functional by an Inner Product 186 

4. The Adjoint Transformation 189 

5. Orthogonal and Unitary Transformations 194 

6. Orthogonal and Unitary Matrices 195 

7. Superdiagonal Form 199 

8. Normal Matrices 201 

9. Normal Linear Transformations 203 

10. Hermitian and Unitary Matrices 208 

11. Real Vector Spaces 209 

12. The Computational Processes 213 

VI Selected applications of linear algebra 219 

1. Vector Geometry 220 

2. Finite Cones and Linear Inequalities 229 

3. Linear Programming 239 

4. Applications to Communication Theory 253 

5. Vector Calculus 259 

6. Spectral Decomposition of Linear Transformations 270 

7. Systems of Linear Differential Equations 278 

8. Small Oscillations of Mechanical Systems 284 

9. Representations of Finite Groups by Matrices 292 

10. Application of Representation Theory to Symmetric Mechanical 
Systems 312 

Appendix 319 

Answers to selected exercises 325 

Index 345 



Introduction 



Many of the most important applications of mathematics involve what are 
known as linear methods. The idea of what is meant by a linear method 
applied to a linear problem or a linear system is so important that it deserves 
attention in its own right. We try to describe intuitively what is meant by 
a linear system and then give some idea of the reasons for the importance of 
the concept. 

As an example, consider an electrical network. When the network receives 
an input, an output from the network results. As is customary, we can con- 
sider combining two inputs by adding them and then putting their sum 
through the system. This sum input will also produce an output. If the 
output of the sum is the sum of the outputs the system is said to be additive. 
We can also modify an input by changing its magnitude, by multiplying the 
input by a constant factor. If the resulting output is also multiplied by the 
same factor the system is said to be homogeneous. If the system is both 
additive and homogeneous it is said to be linear. 

The simplification in the analysis of a system that results from the knowl- 
edge, or the assumption, that the system is linear is enormous. If we know 
the outputs for a collection of different inputs, we know the outputs for all 
inputs that can be obtained by combining these inputs in various ways. 
Suppose, for example, that we are considering all inputs that are periodic 
functions of time with a given period. The theory of Fourier series tells us 
that, under reasonable restrictions, these periodic functions can be represented 
as sums of simple sine functions. Thus in analyzing the response of a linear 
system to a periodic input it is sufficient to determine the response when the 
input is a simple sine function. 

So many of the problems that we encounter are assumed to be linear 
problems and so many of the mathematical techniques developed are 



L Introduction 

inherently linear that a catalog of the possibilities would be a lengthy under- 
taking. Potential theory, the theory of heat, and the theory of small vibrations 
of mechanical systems are examples of linear theories. In fact, it is not easy 
to find brilliantly successful applications of mathematics to non-linear 
problems. In many applications the system is assumed to be linear even 
though it is not. For example, the differential equation for a simple pendulum 
is not linear since the restoring force is proportional to the sine of the dis- 
placement angle. We usually replace the sine of the angle by the angle in 
radians to obtain a linear differential equation. For small angles this is a 
good approximation, but the real justification is that linear methods are 
available and easily applied. 

In mathematics itself the operations of differentiation and integration are 
linear. The linear differential equations studied in elementary courses are 
linear in the sense intended here. In this case the unknown function is the 
input and the differential operator is the system. Any physical problem that 
is describable by a linear differential equation, or system of linear differential 
equations, is also linear. 

Matrix theory, vector analysis, Fourier series, Fourier transforms, and 
Laplace transforms are examples of mathematical techniques which are 
particularly suited for handling linear problems. In order for the linear 
theory to apply to the linear problem it is necessary that what we have called 
"inputs and outputs" and "linear systems" be representable within the theory. 
In Chapter I we introduce the concept of vector space. The laws of combina- 
tion which will be defined for vector spaces are intended to make precise the 
meaning of our vague statement, "combining these inputs in various ways." 
Generally, one vector space will be used for the set of inputs and one vector 
space will be used for the set of outputs. We also need something to represent 
the "linear system," and for this purpose we introduce the concept of linear 
transformation in Chapter II. 

The next step is to introduce a practical method for performing the needed 
calculations with vectors and linear transformations. We restrict ourselves 
to the case in which the vector spaces are finite dimensional. Here it is 
appropriate to represent vectors by ^-tuples and to represent linear trans- 
formations by matrices. These representations are also introduced in 
Chapters I and II. 

Where the vector spaces are infinite dimensional other representations are 
required. In some cases the vectors may be represented by infinite sequences, 
or Fourier series, or Fourier transforms, or Laplace transforms. For 
example, it is now common practice in electrical engineering to represent 
inputs and outputs by Laplace transforms and to represent linear systems 
by still other Laplace transforms called transfer functions. 

The point is that the concepts of vector spaces and linear transformations 
are common to all linear methods while matrix theory applies to only those 



Introduction 3 

linear problems that are finite dimensional. Thus it is of practical value to 
discuss vector spaces and linear transformations as much as possible before 
introducing the formalisms of w-tuples and matrices. And, generally, proofs 
that can be given without recourse to «-tuples and matrices will be shorter, 
simpler, and clearer. 

The correspondences that can be set up between vectors and the //-tuples 
which represent them, and between linear transformations and the matrices 
which represent them, are not unique. Therefore, we have to study the 
totality of all possible ways to represent vectors and linear transformations 
and the relations between these different representations. Each possible 
correspondence is called a coordinization of the vector space, and the process 
of changing from one correspondence to another is called a change of 
coordinates. 

Any property of the vector space or linear transformation which is 
independent of any particular coordinatization is called an invariant or 
geometric property. We are primarily interested in those properties of 
vectors and linear transformations which are invariant, and, if we use a 
coordinatization to establish such a property, we are faced with the problem 
of showing that the conclusion does not depend on the particular coordina- 
tization being used. This is an additional reason for preferring proofs which 
do not make use of any coordinatization. 

On the other hand, if a property is known to be invariant, we are free to 
choose any coordinate system we wish. In such a case it is desirable and 
advantageous to select a coordinate system in which the problem we wish 
to handle is particularly simple, or in which the properties we wish to establish 
are clearly revealed. Chapter III and V are devoted to methods for selecting 
these advantageous coordinate systems. 

In Chapter IV we introduce ideas which allow us to define the concept 
of distance in our vector spaces. This accounts for the principal differences 
between the discussions in Chapters III and V. In Chapter III there is no 
restriction on the coordinate systems which are permitted. In Chapter V the 
only coordinate systems permitted are "Cartesian" ; that is, those in which 
the theorem of Pythagoras holds. This additional restriction in permissible 
coordinate systems means that it is more difficult to find advantageous 
coordinate systems. 

In addition to allowing us to introduce the concept of distance the material 
in Chapter IV is of interest in its own right. There we study linear forms, 
bilinear forms, and quadratic forms. They have application to a number of 
important problems in physics, chemistry, and engineering. Here too, 
coordinate systems are introduced to allow specific calculations, but proofs 
given without reference to any coordinate systems are preferred. 

Historically, the term "linear algebra" was originally applied to the study 
of linear equations, bilinear and quadratic forms, and matrices, and their 



4 Introduction 

changes under a change of variables. With the more recent studies of Hilbert 
spaces and other infinite dimensional vector spaces this approach has proved 
to be inadequate. New techniques have been developed which depend less 
on the choice or introduction of a coordinate system and not at all upon the 
use of matrices. Fortunately, in most cases these techniques are simpler than 
the older formalisms, and they are invariably clearer and more intuitive. 

These newer techniques have long been known to the working mathe- 
matician, but until very recently a curious inertia has kept them out of books 
on linear algebra at the introductory level. 

These newer techniques are admittedly more abstract than the older 
formalisms, but they are not more difficult. Also, we should not identify 
the word "concrete" with the word "useful." Linear algebra in this more 
abstract form is just as useful as in the more concrete form, and in most cases 
it is easier to see how it should be applied. A problem must be understood, 
formulated in mathematical terms, and analyzed before any meaningful 
computation is possible. If numerical results are required, a computational 
procedure must be devised to give the results with sufficient accuracy and 
reasonable efficiency. All the steps to the point where numerical results are 
considered are best carried out symbolically. Even though the notation and 
terminology of matrix theory is well suited for computation, it is not necessar- 
ily the best notation for the preliminary work. 

It is a curious fact that if we look at the work of an engineer applying 
matrix theory we will seldom see any matrices at all. There will be symbols 
standing for matrices, and these symbols will be carried through many steps 
in reasoning and manipulation. Only occasionally or at the end will any 
matrix be written out in full. This is so because the computational aspects 
of matrices are burdensome and unnecessary during the early phases of work 
on a problem. All we need is an algebra of rules for manipulating with them. 
During this phase of the work it is better to use some concept closer to the 
concept in the field of application and introduce matrices at the point where 
practical calculations are needed. 

An additional advantage in our studying linear algebra in its invariant 
form is that there are important applications of linear algebra where the under- 
lying vector spaces are infinite dimensional. In these cases matrix theory 
must be supplanted by other techniques. A case in point is quantum mechanics 
which requires the use of Hilbert spaces. The exposition of linear algebra 
given in this book provides an easy introduction to the study of such spaces. 

In addition to our concern with the beauty and logic of linear algebra in 
this form we are equally concerned with its utility. Although some hints 
of the applicability of linear algebra are given along with its development, 
Chapter VI is devoted to a discussion of some of the more interesting and 
representative applications. 



chapter 



I 



Vector spaces 



In this chapter we introduce the concepts of a vector space and a basis for 
that vector space. We assume that there is at least one basis with a finite 
number of elements, and this assumption enables us to prove that the 
vector space has a vast variety of different bases but that they all have the 
same number of elements. This common number is called the dimension 
of the vector space. 

For each choice of a basis there is a one-to-one correspondence between 
the elements of the vector space and a set of objects we shall call n-tuples. 
A different choice for a basis will lead to a different correspondence between 
the vectors and the ^-tuples. We, regard the vectors as the fundamental 
objects under consideration and the n-tuples a representations of the vectors. 
Thus, how a particular vector is represented depends on the choice of the 
basis, and these representations are non-invariant. We call the n-tuple the 
coordinates of the vector it represents; each basis determines a coordinate 
system. 

We then introduce the concept of subspace of a vector space and develop 
the algebra of subspaces. Under the assumption that the vector space is 
finite dimensional, we prove that each subspace has a basis and that for 
each basis of the subspace there is a basis of the vector space which includes 
the basis of the subspace as a subset. 

1 I Definitions 

To deal with the concepts that are introduced we adopt some notational 
conventions that are commonly used. We usually use sans-serif italic letter 
to denote sets. 

a e S means a is an element of the set S. 

a ^ S means a is not an element of the set S. 



6 Vector Spaces | I 

S c 7 means S is a subset of the set T. 

S n T denotes the intersection of the sets S and T, the set of elements 

in both S and T. 
S u T denotes the union of the sets S and T, the set of elements in S or T. 
T — S denotes the set of elements in T but not in S. In case T is the set 

of all objects under consideration, we shall call T — S the 

complement of S and denote it by CS. 
S^i/u e M denotes a collection of sets indexed so that one set S^ is specified 

for each element /u e M. M is called the index set. 
n ^M$M denotes the intersection of all sets S^-./j, e M. 
^e/v^V denotes the union of all sets S^-.ju, e M. 
denotes the set with no elements, the empty set. 

A set will often be specified by listing the elements in the set or by giving 
a property which characterizes the elements of the set. In such cases we 
use braces: {a, /?} is the set containing just the elements a and /S, {a | P} 
is the set of all a with property P, {a^ | fi e M} denotes the set of all a„ 
corresponding to [x in the index set M. We have such frequent use for the set 
of all integers or a subset of the set of all integers as an index set that we 
adopt a special convention for these cases, {aj denotes a set of elements 
indexed by a subset of the set of integers. Usually the same index set is used 
over and over. In such cases it is not necessary to repeat the specifications 
of the index set and often designation of the index set will be omitted. 
Where clarity requires it, the index set will be specified. We are careful to 
distinguish between the set {aj and an element o^ of that set. 

Definition. By a. field F we mean a non-empty set of elements with two laws 
of combination, which we call addition and multiplication, satisfying the 
following conditions : 

Fl. To every pair of elements a, b e F there is associated a unique element, 
called their sum, which we denote by a + b. 

F2. Addition is associative; (a + b) + c = a + (b + c). 

F3. There exists an element, which we denote by 0, such that a + = a 
for all aeF. 

FA. For each a e F there exists an element, which we denote by —a, such 
that a + (—a) = 0. Following usual practice we write b + (—a) = b — a. 

F5. Addition is commutative; a + b = b + a. 

F6. To every pair of element a, b e F there is associated a unique element, 
called their product, which we denote by ab, or a • b. 

Fl. Multiplication is associative; (ab)c = a(bc). 

FS. There exists an element different from 0, which we denote by 1, 
such that a • 1 = a for all a e F. 



1 I Definitions 7 

F9. For each a e F, a ^ 0, there exists an element which we denote by 
a -1 , such that a • a -1 = 1. 

F10. Multiplicati6n is commutative: ab = ba. 

F\\. Multiplication is distributive with respect to addition: 

(a + b)c = ac + be. 

The elements of F are called scalars, and will generally be denoted by lower 
case Latin italic letters. 

The rational numbers, real numbers, and complex numbers are familiar 
and important examples of fields, but they do not exhaust the possibilities. 
As a less familiar example, consider the set {0, 1} where addition is defined 
by the rules: + 0=1 + 1=0, 0+1 = 1; and multiplication is defined 
by the rules: 0-0 = 0-1=0, 1-1 = 1. This field has but two elements, 
and there are other fields with finitely many elements. 

We do not develop the various properties of abstract fields and we are 
not concerned with any specific field other than the rational numbers, the 
real numbers, and the complex numbers. We find it convenient and desirable 
at the moment to leave the exact nature of the field of scalars unspecified 
because much of the theory of vector spaces and matrices is valid for arbitrary 
fields. 

The student unacquainted with the theory of abstract fields will not be 
handicapped for it will be sufficient to think of F as being one of the familiar 
fields. All that matters is that we can perform the operations of addition and 
subtraction, multiplication and division, in the usual way. Later we have to 
restrict F to either the field of real numbers or the field of complex numbers 
in order to obtain certain classical results; but we postpone that moment as 
long as we can. At another point we have to make a very mild assumption, 
that is, 1 + 1 5* 0, a condition that happens to be false in the example given 
above. The student interested mainly in the properties of matrices with real 
or complex coefficients should consider this to be no restriction. 

Definition. A vector space V over F is a non-empty set of elements, called 
vectors, with two laws of combination, called vector addition (or addition) 
and scalar multiplication, satisfying the following conditions: 

A\. To every pair of vectors a, $ e V there is associated a unique vector 
in V called their sum, which we denote by <x + /?. 

A2. Addition is associative; (a + /J) + y = a + (/? + y). 

A3. There exists a vector, which we denote by 0, such that a + = a 
for all a e V. 

AA. For each ae V there exists an element, which we denote by —a, 
such that a + ( — a) = 0. 

A5. Addition is commutative; a + (3 = /? + a. 



8 Vector Spaces | I 

B\. To every scalar a e F and vector a e V, there is associated a unique 
vector, called the product of a and a, which we denote by aa. 

52. Scalar multiplication is associative : a(ba.) = (ab)tx.. 

B3. Scalar multiplication is distributive with respect to vector addition; 
a(oc + /?) = aa + a/?. 

54. Scalar multiplication is distributive with respect to scalar addition; 
(a + b)ct. = aa + bca. 

B5. 1 • a = a (where 1 e F). 

We generally use lower case Greek letters to denote vectors. An exception 
is the zero vector of A3. From a logical point of view we should not use the 
same symbol "0 ' for both the zero scalar and the zero vector, but this practice 
is rooted in a long tradition and it is not as confusing as it may seem at first. 

The vector space axioms concerning addition alone have already appeared 
in the definition of a field. A set of elements satisfying the first four axioms 
is called a group. If the set of elements also satisfies A5 it is called a com- 
mutative group or abelian group. Thus both fields and vector spaces are 
abelian groups under addition. The theory of groups is well developed and 
our subsequent discussion would be greatly simplified if we were to assume 
a prior knowledge of the theory of groups. We do not assume a prior 
knowledge of the theory of groups; therefore, we have to develop some of 
their elementary properties as we go along, although we do not stop to point 
out that what was proved is properly a part of group theory. Except for 
specific applications in Chapter VI we do no more than use the term "group" 
to denote a set of elements satisfying these conditions. 

First, we give some examples of vector spaces. Any notation other than 
"F" for a field and "V" for a vector space is used consistently in the same 
way throughout the rest of the book, and these examples serve as definitions 
for these notations: 

(1) Let F be any field and let V = P be the set of all polynomials in an 
indeterminate x with coefficients in F. Vector addition is defined to be the 
ordinary addition of polynomials, and multiplication is defined to be the 
ordinary multiplication of a polynomial by an element of F. 

(2) For any positive integer n, let P n be the set of all polynomials in x 
with coefficients in F of degree < n — 1 , together with the zero polynomial. 
The operations are defined as in Example (1). 

(3) Let F = R, the field of real numbers, and take V to be the set of all 
real-valued functions of a real variable. If/ and g are functions we define 
vector addition and scalar multiplication by the rules 

(f+g)(*)=f(*)+g(*)> 
(af)(x) = a[f(x)]. 



1 I Definitions " 

(4) Let F = R, and let V be the set of continuous real-valued functions 
of a real variable. The operations are defined as in Example (3). The point 
of this example is that it requires a theorem to show that A\ and B\ are 
satisfied. 

(5) Let F = R, and let V be the set of real-valued functions defined on 
the interval [0, 1] and integrable over that interval. The operations are 
defined as in Example (3). Again, the main point is to show that A\ and B\ 
are satisfied. 

(6) Let F = R, and let V be the set of all real- valued functions of a real 
variable difterentiable at least m times (m a positive integer). The operations 
are defined as in Example (3). 

(7) Let F = R, and let V be the set of all real-valued functions difterentiable 

d 2 y 
at least twice and satisfying the differential equation — + y = 0. 

(8) Let F = R, and let V = R n be the set of all real ordered w-tuples, 
a = (a lt a 2 , . . . , a n ) with a t e F. Vector addition and scalar multiplication 
are. defined by the rules 

(a lt ... ,a n ) + (b u ...,b n )=(a 1 + b 1 ,...,a n + b n ), ^ ^ 

a(a lt ... ,a n ) = (aa lt . . . , aa n ). 

We call this vector space the n-dimensional real coordinate space or the 
real affine n-space. (The name "Euclidean n-space" is sometimes used, but 
that term should be reserved for an affine w-space in which distance is defined.) 
- (9) Let F n be the set of all w-tuples of elements of F. Vector addition and 
scalar multiplication are defined by the rules (1.2). We call this vector 
space an n-dimensional coordinate space. 

An immediate consequence of the axioms defining a vector space is that 
the zero vector, whose existence is asserted in A3, and the negative vector, 
whose existence is asserted in A4, are unique. Specifically, suppose 
satisfies A3 for all vectors in V and that for some a e V there is a 0' satisfying 
the condition a + = a + 0' = a. Then 0' = 0' + = 0' + (a + (-a)) = 
(0' + a) + (-a) = (a + 0') + (-a) = a + (-a) = 0. Notice that we 
have proved not merely that the zero vector satisfying A3 for all a is unique; 
we have proved that a vector satisfying the condition of A3 for some a. must 
be the zero vector, which is a much stronger statement. 

Also, suppose that to a given a there were two negatives, (—a) and 
(—a)', satisfying the conditions of A4. Then (— a)' =. (— a)' + = 
(-a)' + a + (-a) = (-a) + a + (-a)' = (-a) + = (-a). Both 

these demonstrations used the commutative law, A5. Use of this axiom 
could have been avoided, but the necessary argument would then have been 
somewhat longer. 



^L 



10 Vector Spaces | I 

Uniqueness enables us to prove that 0a = 0. (Here is an example of the V*y 
seemingly ambiguous use of the symbol "0." The "0" on the left side is a Ly* 
scalar while that on the right is a vector. However, no other interpretation 1 
could be given the symbols and it proves convenient to conform to the 
convention rather than introduce some other symbol for the zero vector.) 
For each a e V, a = 1 • a = (1 + 0)a = 1 • a + • a = or+ • a. Thus * 
• a = 0. In a similar manner we can show that (— l)a = — a| oT+"X— T)a = / ) 
(1 — l)a = • a = 0. Since the negative vector is unique we see that \ - 
(— l)a = — a. It also follows similarly that a • = 0. 

EXERCISES 

1 to 4. What theorems must be proved in each of the Examples (4), (5), (6), and 
(7) to verify All To verify B\ ? (These axioms are usually the ones which require 
most specific verification. For example, if we establish that the vector space 
described in Example (3) satisfies all the axioms of a vector space, then A\ and B\ 
are the only ones that must be verified for Examples (4), (5), (6), and (7). Why?) 

5. Let P + be the set of polynomials with real coefficients and positive constant 
term. Is P + a vector space ? Why ? 

6. Show that if aa =0 and a ¥=■ 0, then a = 0. {Hint: Use axiom F9 for fields.) 

7. Show that if aa =0 and a^fl, then a = 0. 

8. Show that the £ such that a + f = p is (uniquely) £ = p + (-a). 

9. Let a = (2, —5, 0, 1) and P = (—3, 3, 1, —1) be vectors in the coordinate 
space R 4 . Determine 

(a) a+p. 
{b) *-p. 
(c) 3a. 
{d) 2a + 3/3. 

10. Show that any field can be considered to be a vector space over itself. 

11. Show that the real numbers can be considered to be a vector space over the 
rational numbers. 

12. Show that the complex numbers can be considered to be a vector space over 
the real numbers. 

13. Prove the uniqueness of the zero vector and the uniqueness of the negative 
of each vector without using the commutative law, A5. 

2 I Linear Independence and Linear Dependence 

Because of the associative law for vector addition, we can omit the 
parentheses from expressions like a^ + (a 2 <x 2 + a 3 <x s ) = {^^ + a 2 a 2 ) + 
« 3 a 3 and write them in the simpler form a^ + a 2 tx. 2 + a 3 a 3 = ^Li a * a i- 
It is clear that this convention can be extended to a sum of any number of 



2 | Linear Independence and Linear Dependence 11 

such terms provided that only a finite number of coefficients are different 
from zero. Thus, whenever we write an expression like ^ a^ (in which we 
do not specify the range of summation), it will be assumed, tacitly if not 
explicitly, that the expression contains only a finite number of non-zero 
coefficients. 

If /5 = ^t fl i a i» we sa y tnat $ 1S a linear combination of the o^. We also 
say that /8 is linearly dependent on the a* if can be expressed as a linear 
combination of the o^. An expression of the form 2» fl z a i = is called a 
linear relation among the <x f . A relation with all a t = is called a trivial 
linear relation; a relation in which at least one coefficient is non-zero is 
called a non-trivial linear relation. 

Definition. A set of vectors is said to be linearly dependent if there exists a 
non-trivial linear relation among them. Otherwise, the set is said to be 
linearly independent. 

It should be noted that any set of vectors that includes the zero vector is 
linearly dependent. A set consisting of exactly one non-zero vector is linearly 
independent. For if aa = with a ^ 0, then a = 1 • a = (a- 1 • a)a. = 
ar\aa) — cr 1 • = 0. Notice also that the empty set is linearly independent. 

It is clear that the concept of linear independence of a set would be mean- 
ingless if a vector from a set could occur arbitrarily often in a possible relation. 
If a set of vectors is given, however, by itemizing the vectors in the set it is a 
definite inconvenience to insist that all the vectors listed be distinct. The 
burden of counting the number of times a vector can appear in a relation is 
transferred to the index set. For each index in the index set, we require that 
a linear relation contain but one term corresponding to that index. Similarly, 
when we specify a set by itemizing the vectors in the set, we require that one and 
only one vector be listed for each index in the index set. But we allow the 
possibility that several indices may be used to identify the same vector. Thus 
the set {<x. lt a 2 }, where a x = a 2 is linearly dependent, and any set with any 
vector listed at least twice is linearly dependent. To be precise, the concept 
of linear independence is a property of indexed sets and not a property of sets. 
In the example given above, the relation a x — a 2 = involves two terms in 
the indexed set {a* | i e {1, 2}} while the set {a l5 a 2 } actually contains only 
one vector. We should refer to the linear dependence of an indexed set rather 
than the linear dependence of a set. The conventional terminology, which 
we are adopting, is inaccurate. This usage, however, is firmly rooted in 
tradition and, once understood, it is a convenience and not a source of 
difficulty. We speak of the linear dependence of a set, but the concept always 
refers to an indexed set. For a linearly independent indexed set, no vector can 
be listed twice; so in this case the inaccuracy of referring to a set rather than 
an indexed set is unimportant. 



12 Vector Spaces | I 

The concept of linear dependence and independence is used in essentially 
two ways. (1) If a set {aj of vectors is known to be linearly dependent, 
there exists a non-trivial linear relation of the form ]£t a i Cf -i — 0- (This 
relation is not unique, but that is usually incidental.) There is at least one 
non-zero coefficient; let a k be non-zero. Then a fc = ^ i ^ k {—af 1 a^)a. i ; that is 
one of the vectors of the set {aj is a linear combination of the others. (2) If a 
set {aj of vectors is known to be linearly independent and a linear relation 
2i a^i = is obtained, we can conclude that all a t = 0. This seemingly 
trivial observation is surprisingly useful. 

In Example (1) the zero vector is the polynomial with all coefficients 
equal to zero. Thus the set of monomials {1, x, x 2 , . . .} is a linearly in- 
dependent set. The set {l,x,x 2 ,x 2 + x + 1} is linearly dependent since 
1 -f x + x 2 — (x 2 + x + 1) = 0. In P n of Example (2), any n + 1 poly- 
nomials form a linearly dependent set. 

In R 3 consider the vectors {a = (1, 1, 0), p = (1, 0, 1), y = (0, 1, 1), 
b — (1, 1, 1)}. These four vectors are linearly dependent since a + /3 + 
y — 2d = 0, yet any three of these four vectors are linearly independent. 

Theorem 2.1. If a is linearly dependent on {&} and each & is linearly 
dependent on {/,}, then a is linearly dependent on {y 3 }. 

proof. From a = ^ bfii and & = ^ c^y^ it follows that a = 

Theorem 2.2. A set of non-zero vectors {a l5 a 2 , . . .} is linearly dependent 
if and only if some tx. k is a linear combination of the a,- with j < k. 

proof. Suppose the vectors {a 1? a 2 , . . .} are linearly dependent. Then 
there is a non-trivial linear relation among them; 2* a i y -i — 0-. Since a 
positive finite number of coefficients are non-zero, there is a last non-zero 
coefficient a k . Furthermore, k > 2 since <x. x ^ 0. Thus a fc = — a^^ti^i*** = 

The converse is obvious. D 

For any subset A of V the set of all linear combinations of vectors in A 
is called the set spanned by A, and we denote it by (A). We also say that A 
spans (A). It is a part of this definition that A c (A). We also agree that the 
empty set spans the set consisting of the zero vector alone. It is readily 
apparent that if A c 8, then (A) c <8>. 

In this notation Theorem 2.1 is equivalent to the statement: If A c <8> 
and 8 c: <C>, then A c <C>. 

Theorem 2.3. The set {aj of non-zero vectors is linearly independent if 
and only if for each k, <x k $ <a l9 . . . , a^). (To follow our definitions exactly, 
the set spanned by {a l5 . . . , a^} should be denoted by ({o^, . . . , a^}). 



2 | Linear Independence and Linear Dependence 13 

We shall use the symbol <a ls . . . , a^) instead since it is simpler and there is 

no danger of ambiguity.) 

proof. This is merely Theorem 2.2 in contrapositive form and stated in 

new notation. □ , r . „ . , 

<fc.- c * ? / • 

Theorem 2.4. IfB and C are any subsets such thato <= (C), then (B) <= (Q. 
proof. Set A = (B) in Theorem 2.1. Then 8 c <C) implies that <B> = 

A c <C>. D 

Theorem 2.5. If cn k e A is dependent on the other vectors in A, then (A) = 

(A - K}>. 

proof. The assumption that a fc is dependent on A — {a fc } means that 
Ac.(A- {a fc }>. It then follows from Theorem 2.4 that (A) c (A - (aj). 
The equality follows from the fact that the inclusion in the other direction 
is evident. □ 

Theorem 2.6. For any set C, ((C)) = (C). 

proof. Setting 6 = (C) in Theorem 2.4 we obtain ((C)) = (B) c (p. 
Again, the inclusion in the other direction is obvious. □ 

Theorem 2.7. If a finite set A = {a l5 . . . , a n } spans V, then every linearly 
independent set contains at most n elements. 

proof. Let B = {/?!, jg 2 , . . . } be a linearly independent set. We shall 
successively replace the a f by the &, obtaining at each step a new n-element 
set that spans V. Thus, suppose that A k = {^ t , . . . , &., a fc+1 , . . . , a„} is an 
n-element set that spans V. (Our starting point, the hypothesis of the 
theorem, is the case k = 0.) Since A k spans V, fi k+1 is dependent on A k . 
Thus the set {jS l5 . . . , )8 fc , ]8 fc+1 , a fc+1 , . . . , a n } is linearly dependent. In any 
non-trivial relation that exists the non-zero coefficients cannot be confined 
to the j8,-, for they are linearly independent. Thus one of the 0^(1 > k) is 
dependent on the others, and after reindexing {a fc+1 , . . . , a n } if necessary 
we may assume that it is a fc+1 . By Theorem 2.5 the set A k+1 = {/5 X , . . . , 
Pk+i, a *+2, • • • , <*«} also spans V. 

If there were more than n elements in B, we would in this manner arrive 
at the spanning set A n = {^, . . . , p n }. But then the dependence of /S n+1 on 
A n would contradict the assumed linear independence of B. Thus B contains 
at most n elements. □ 

Theorem 2.7 is stated in slightly different forms in various books. The 
essential feature of the proof is the step-by-step replacement of the vectors 
in one set by the vectors in the other. The theorem is known as the Steinitz 
replacement theorem. 



14 Vector Spaces | I 



EXERCISES 

1. In the vector space P of Example (1) let p x (x) = x 2 + x + 1, p 2 (x) = 
x 2 — x — 2,p 3 (x) = x 2 + x — l,/> 4 0*0 = x — 1. Determine whether or not the set 
{pi( x ), Pi( x ), Pz{ x )> Pi( x )} is linearly independent. If the set is linearly dependent, 
express one element as a linear combination of the others. 

2. Determine ({p x (x), p 2 (x), p z (x), p^x)}), where the p { (x) are the same poly- 
nomials as those defined in Exercise 1 . (The set required is infinite, so that we 
cannot list all its elements. What is required is a description; for example, "all 
polynomials of a certain degree or less," "all polynomials with certain kinds of 
coefficients," etc.) 

3. A linearly independent set is said to be maximal if it is contained in no larger 
linearly independent set. In this definition the emphasis is on the concept of set 
inclusion and not on the number of elements in a set. In particular, the definition 
allows the possibility that two different maximal linearly independent sets might 
have different numbers of elements. Find all the maximal linearly independent 
subsets of the set given in Exercise 1. How many elements are in each of them? 

4. Show that no finite set spans P; that is, show that there is no maximal finite 
linearly independent subset of P. Why are these two statements equivalent ? 

5. In Example (2) for n = 4, find a spanning set for P 4 . Find a minimal spanning 
set. Use Theorem 2.7 to show that no other spanning set has fewer elements. 

6. In Example (1) or (2) show that {1, x + 1, x 2 + x + 1, x 3 + x 2 + x + 1, 
x* + x 3 + x 2 + x + 1} is a linearly independent set. 

7. In Example (1) show that the set of all polynomials divisible by x - 1 cannot 
span P. 

8. Determine which of the following set in R 4 are linearly independent over R. 

(a) {(1,1,0,1), (1,-1,1,1), (2,2,1,2), (0,1,0,0)}. 

(b) {(1,0,0, 1), (0,1,1,0), (1,0,1,0), (0,1,0,1)}. 

(c) {(1,0,0, 1), (0,1,0,1), (0,0,1,1), (1,1,1,1)}. 

9. Show that {e x = (1, 0, 0, . . . , 0), e 2 = (0, 1, 0, . . . , 0), . . . , e n = (0, 0, 0, 
. . . , 1)} is linearly independent in F n over F. 

10. In Exercise 11 of Section 1 it was shown that we may consider the real 
numbers to be a vector space over the rational numbers. Show that {1, V2} is a 
linearly independent set over the rationals. (This is equivalent to showing that 
Vl is irrational.) Using this result show that {1, V2, Vj} is linearly independent. 

11. Show that if one vector of a set is the zero vector, then the set is linearly 
dependent. 

12. Show that if an indexed set of vectors has one vector listed twice, the set is 
linearly dependent. 

13. Show that if a subset of S is linearly dependent, then S is linearly dependent. 

14. Show that if a set S is linearly independent, then every subset of S is linearly 
independent. 



3 | Bases of Vector Spaces 15 

15. Show that if the set A = {a l5 . . . , a n } is linearly independent and {a l5 . . . , 
a n , /3} is linearly dependent, then /S is dependent on A. 

16. Show that, if each of the vectors {ft,, p lf . . . , fi n } is a linear combination of 
the vectors {«!,...,«„}, then {p , p x , . . . , P n } is linearly dependent. 



3 I Bases of Vector Spaces 

Definition. A linearly independent set spanning a vector space V is called a 
basis or base (the plural is bases) of V. 

If A = {a l5 a 2 , . . .} is a basis of V, by definition anaeV can be written 
in the form a = J< «i a i- The interesting thing abotit a basis, as distinct from 
other spanning sets, is that the coefficients are uniquely determined by a. 
For suppose that we also have a = J, ^oq. Upon subtraction we get the 
linear relation Ji ( fl < — ^i) a i = °- since M is a Nearly independent 
set, flj — bi = and ^ = Z>,- for each /. A related fact is that a basis is a 
particularly efficient spanning set, as we shall see. 

In Example (1) the vectors {a,- = x l | i = 0, 1 , . . .} form a basis. We have 
already observed that this set is linearly independent, and it clearly spans 
the space of all polynomials. The space P„ has a basis with a finite number of 
elements; {1, x, z 2 , . . . , x n - x ). 

The vector spaces in Examples (3), (4), (5), (6), and (7) do not have bases 
with a finite nuftiber of elements. 

In Example (8) every R n has a finite basis consisting of {a, | a 4 = (d u , 
<5 2 i, • • • » dm)}- (Here 6 ti is the useful symbol known as the Kronecker delta. 
By definition d i6 = if i ^j and d u = 1.) 

Theorem 3.1. If a vector space has one basis with a finite number of 
elements, then all other bases are finite and have the same number of elements. 

proof. Let A be a basis with a finite number n of elements, and let 8 be 
any other basis. Since A spans V and B is linearly independent, by Theorem 
2.7 the number m of element in 6 must be at most n. This shows that 8 is 
finite and m < n. But then the roles of A and 6 can be interchanged to 
obtain the inequality in the other order so that m = n. □ 

A vector space with a finite basis is called a finite dimensional vector 
space, and the number of elements in a basis is called the dimension of 
the space. Theorem 3.1 says that the dimension of a finite dimensional vector 
space is well defined. The vector space with just one element, the zero 
vector, has one linearly independent subset, the empty set. The empty set 
is also a spanning set and is therefore a basis of {0}. Thus {0} has dimension 
zero. There are very interesting vector spaces with infinite bases ; for example, 
P of Example (1). Moreover, many of the theorems and proofs we give are 
also valid for infinite dimensional vector spaces. It is not our intention, 



16 Vector Spaces | I 

however, to deal with infinite dimensional vector spaces as such, and when- 
ever we speak of the dimension of a vector space without specifying whether 
it is finite or infinite dimensional we mean that the dimension is finite. 

Among the examples we have discussed so far, each P n and each R n is 
/i-dimensional. We have already given at least one basis for each. There 
are many others. The bases we have given happen to be conventional and 
convenient choices. 

Theorem 3.2. Any n + 1 vectors in an n-dimensional vector space are 
linearly dependent. 
proof. Their independence would contradict Theorem 2.7. □ 

We have already seen that the four vectors {a = (1, 1, 0), = (1, 0, 1), 
y = (0, 1, 1), d = (1, 1, 1)} form a linearly dependent set in R 3 . Since R 3 
is 3-dimensional we see that this must be expected for any set containing 
at least four vectors from R 3 . The next theorem shows that each subset of 
three is a basis. 

Theorem 3.3. A set of n vectors in an n-dimensional vector space V is a 
basis if and only if it is linearly independent. < ^ u* ^ h JUa**^ U^^W- </^v,w- ;^Y 

proof. The "only if" is part of the definition of a basis. Let A = 
{<*!, . . . , a J be a linearly independent set and let a be any vector in V. 
Since {a 1? . . . , a„, a} contains n + 1 elements it must be linearly dependent. 
Any non-trivial relation that exists must contain a with a non-zero coefficient, 
for if that coefficient were zero the relation would amount to a relation in A. 
Thus a is dependent on A. Hence A spans V and is a basis. □ 

Theorem 3.4. A set of n vectors in an n-dimensional vector space V is a 
basis if and only if it spans V. « - ■• . * '*' 1 *- Y*""*" ' ' w ' ^ ^^ * ^f >^ u '< 

proof. The "only if" is part of the definition of a basis. If n vectors did 
span V and were linearly dependent, then (by Theorem 2.5) a proper subset 
would also span V, contrary to Th e o rem 2 .7. □ ijn, <Uh^ 

We see that a basis is a maximal linearly independent set and a minimal 
spanning set. This idea is made explicit in the next two theorems. 

Theorem 3.5. In a finite dimensional vector space, every spanning set 
contains a basis. 

proof. Let B be a set spanning V. If V = {0}, then <= 8 is a basis of 
{0}. If V 5^ {0}, then 6 must contain at least one non-zero vector a x . We 
now search for another vector in B which is not dependent on {%}. We 
call this vector a 2 and search for another vector in B which is not dependent 
on the linearly independent set {a l9 a 2 }. We continue in this way as long as 
we can, but the process must terminate as we cannot find more than n 

£«. (M~t 1-3*2. . -J, *U Wv.^ ^ W" -<r^*-*- W-<- "^ " >^vw, j^ , ^ 



3 | Bases of Vector Spaces 17 

linearly independent vectors in 6. Thus suppose we have obtained the set 
A = {a l5 . . . , a TO } with the property that every vector in B is linearly de- 
pendent on A. Then because of Theorem 2. 1 the set A must also span V and 
it is a basis. □ 

To drop the assumption that the vector space is ^-dimensional would 
change the complexion of Theorem 3.5 entirely. As it stands the theorem 
is interesting but minor, and not difficult to prove. Without this assumption 
the theorem would assert that every vector space has a basis since every 
vector space is spanned by itself. Discussion of such a theorem is beyond the 
aims of this treatment of the subject of vector spaces. 

Theorem 3.6. In a finite dimensional vector space any linearly independent 
set of vectors can be extended to a basis. 

proof. Let A = {a l5 . . . , aj be a basis of V, and let 8 = {/? l5 . . . , /5 m } 
be a linearly independent set (m < n). The set {/? l5 . . . , ft m , a l9 . . . , a M } 
spans V. If this set is linearly dependent (and it surely is if m > 0) then 
some element is a linear combination of the preceding elements (Theorem 
2.2). This element cannot be one of the /3/s for then B would be linearly 
dependent. But then this u. t can be removed to obtain a smaller set spanning 
V (Theorem 2.5). We continue in this way, discarding elements as long as 
we have a linearly dependent spanning set. At no stage do we discard one of 
the /3/s. Since our spanning set is finite this process must terminate with a 
basis containing B as a subset. □ 

Theorem 3.6 is one of the most frequently used theorems in the book. 
It is often used in the following way. A non-zero vector with a certain desired 
property is selected. Since the vector is non-zero, the set consisting of that 
vector alone is a linearly independent set. An application of Theorem 3.6 
shows that there is a basis containing that vector. This is usually the first step 
of a proof by induction in which a basis is obtained for which all the vectors 
in the basis have the desired property. 

Let A = {a l5 . . . , a„} be an arbitrary basis of V, a vector space of dimension 
n over the field F. Let a be any vector in V. Since A is a spanning set a can be 
represented as a linear combination of the form <x = 2"=i a t a »- Since A is 
linearly independent this representation is unique, that is, the coefficients a t 
are uniquely determined by a (for the given basis A). On the other hand, 
for each n-tuple (a t , . . . , a n ) there is a vector in V of the form ^Li a i a i- 
Thus there is a one-to-one correspondence between the vectors in V and the 
w-tuples (fl l5 . . . , a n ) g F n . 

If a = 2?=i fl i a i' th e sca l ar a i is called the i-th coordinate of a, and a 4 a t - 
is called the i-th component of a. Generally, coordinates and components 
depend on the choice of the entire basis and cannot be determined from 



18 Vector Spaces | I 

individual vectors in the basis. Because of the rather simple correspondence 
between coordinates and components there is a tendency to confuse them and 
to use both terms for both concepts. Since the intended meaning is usually 
clear from context, this is seldom a source of difficulty. 

If a = JjLx a^i corresponds to the «-tuple (a lf . . . , a n ) and /? = 2?=x *t a i 
corresponds to the n-tuple (b lt . . . , b n ), then a. + /5 = 2?=i( a i + ^t) a i 
corresponds to the «-tuple (a x + b u . . . , a n + b n ). Also, act. = ^Li afir » a i 
corresponds to the rc-tuple {aa x , . . . , aa n ). Thus the definitions of vector 
addition and scalar multiplication among /i-tuples defined in Example (9) 
correspond exactly to the corresponding operations in V among the vectors 
which they represent. When two sets of objects can be put into a one-to-one 
correspondence which preserves all significant relations among their elements, 
we say the two sets are isomorphic; that is, they have the same form. Using 
this terminology, we can say that every vector space of dimension n over a 
given field F is isomorphic to the w-dimensional coordinate space F n . Two 
sets which are isomorphic differ in details which are not related to their inter- 
nal structure. They are essentially the same. Furthermore, since two sets 
isomorphic to a third are isomorphic to each other we see that all w-dimen- 
sional vector spaces over the same field of scalars are isomorphic. 

The set of «-tuples together with the rules for addition and scalar multi- 
plication forms a vector spaceinitsown right. However, when a basis is chosen 
in an abstract vector space V the correspondence described above establishes 
an isomorphism between V and F n . In this context we consider F n to be a 
representation of V. Because of the existence of this isomorphism a study 
of vector spaces could be confined to a study of coordinate spaces. However, 
the exact nature of the correspondence between V and F n depends upon the 
choice of a basis in V. If another basis were chosen in V a correspondence 
between the a g V and the «-tuples would exist as before, but the correspond- 
ence would be quite different. We choose to regard the vector space V and the 
vectors in V as the basic concepts and their representation by «-tuples as a tool 
for computation and convenience. There are two important benefits from 
this viewpoint. Since we are free to choose the basis we can try to choose a 
coordinatization for which the computations are particularly simple or for 
which some fact that we wish to demonstrate is particularly evident. In 
fact, the choice of a basis and the consequences of a change in basis is the 
central theme of matrix theory. In addition, this distinction between a 
vector and its representation removes the confusion that always occurs when 
we define a vector as an n-tuple and then use another «-tuple to represent it. 

Only the most elementary types of calculations can be carried out in the 
abstract. Elaborate or complicated calculations usually require the intro- 
duction of a representing coordinate space. In particular, this will be re- 
quired extensively in the exercises in this text. But the introduction of 



3 | Bases of Vector Spaces 19 

coordinates can result in confusions that are difficult to clarify without ex- 
tensive verbal description or awkward notation. Since we wish to avoid 
cumbersome notation and keep descriptive material at a minimum in the 
exercises, it is helpful to spend some time clarifying conventional notations 
and circumlocutions that will appear in the exercises. 

The introduction of a coordinate representation for V involves the selection 
of a basis {<x x , . . . , a n } for V. With this choice a x is represented by (1, 0, 
... , 0), a 2 is represented by (0, 1 , 0, . . . , 0), etc. While it may be necessary 
to find a basis with certain desired properties the basis that is introduced at 
first is arbitrary and serves only to express whatever problem we face in a 
form suitable for computation. Accordingly, it is customary to suppress 
specific reference to the basis given initially. In this context it is customary 
to speak of "the vector {a x , a 2 , . . . , a„)" rather than "the vector a whose 
representation with respect to the given basis {a l9 . . . , a n }is(a 1 , a 2 , . . . , a n )." 
Such short-cuts may be disgracefully inexact, but they are so common that 
we must learn how to interpret them. 

For example, let V be a two-dimensional vector space over R. Let A = 
{a 1? a 2 } be the selected basis. If ^ = a x + a 2 an d j8 2 = — a i + a 2> then 
8 = {/?!, j8 2 } is also a basis of V. With the convention discussed above we 
would identify a x with (1,0), a 2 with (0, 1), /? x with (1, 1), and |8 2 with 
(— 1, 1). Thus, we would refer to the basis 8 = {(1, 1), (—1, 1)}. Since 
a i = IPi ~ £&> a i has the representation (\, —%) with respect to the basis 8. 
If we are not careful we can end up by saying that "(1, 0) is represented by 



EXERCISES 

To show that a given set is a basis by direct appeal to the definition means 
that we must show the set is linearly independent and that it spans V. In any 
given situation, however, the task is very much simpler. Since V is ^-dimensional 
a proposed basis must have n elements. Whether this is the case can be told at a 
glance. In view of Theorems 3.3 and 3.4 if a set has n elements, to show that it is a 
basis it suffices to show either that it spans V or that it is linearly independent. 

1. In R 3 show that {(1, 1,0), (1, 0, 1), (0, 1, 1)} is a basis by showing that it is 
linearly independent. 

2. Show that {(1,1, 0), (1,0, 1), (0, 1, 1)} is a basis by showing that <(1, 1, 0), 
(1, 0, 1), (0, 1, 1)> contains (1, 0, 0), (0, 1,0) and (0, 0, 1). Why does this suffice? 

3. In R 4 let A = {(1, 1, 0, 0), (0, 0, 1, 1), (1,0, 1, 0), (0, 1, 0, -1)} be a basis 
(is it?) and let 8 ={(1,2, -1, 1), (0, 1,2, -1)} be a linearly independent set 
(is it ?). Extend B to a basis of R 4 . (There are many ways to extend 8 to a basis. 
It is intended here that the student carry out the steps of the proof of Theorem 3.6 
for this particular case.) 



20 Vector Spaces | I 

4. Find a basis of R 4 containing the vector (1, 2, 3, 4). (This is another even 
simpler application of the proof of Theorem 3.6. This, however, is one of the most 
important applications of this theorem, to find a basis containing a particular 
vector.) 

5. Show that a maximal linearly independent set is a basis. 

6. Show that a minimal spanning set is a basis. 

4 I Subspaces 

Definition. A subspace VV of a vector space V is a non-empty subset of V 
which is itself a vector space with respect to the operations of addition and 
scalar multiplication defined in V. In particular, the subspace must be a 
vector space over the same field F. 

The first problem that must be settled is the problem of determining the 
conditions under which a subset W is in fact a subspace. It should be clear 
that axioms A2, A5, B2, 53, 54, and 55 need not be checked as they are valid 
in any subset of V. The most innocuous conditions seem to be Al and B\, 
but it is precisely these conditions that must be checked. If B\ holds for a 
non-empty subset VV, there is an a e VV so that 0<x = g VV. Also, for each 
a e W, (— l) a = — a g W. Thus A3 and AA follow from B\ in any non- 
empty subset of a vector space and it is sufficient to check that VV is non- 
empty and closed under addition and scalar multiplication. 

The two closure conditions can be combined into one statement: if 
a, |8 g VV and a, b e F, then aa + bfi e VV. This may seem to be a small 
change, but it is a very convenient form of the conditions. It is also equivalent 
to the statement that all linear combinations of elements in VV are also in VV; 
that is, <W> = VV. It follows directly from this statement that for any 
subset A, (A) is a subspace. Thus, instead of speaking of the subset spanned 
by A, we speak of the subspace spanned by A. 

Every vector space V has V and the zero space {0} as subspaces. As a rule 
we are interested in subspaces other than these and to distinguish them we 
call the subspaces other than V and {0} proper subspaces. In addition, if 
VV is a subspace we designate subspaces of VV other than VV and {0} as proper 
subspaces of VV. 

In Examples (1) and (2) we can take a fixed finite set {x x , x 2 , . . . , x m } 
of elements of F and define VV to be the set of all polynomials such that 
p{ Xl ) = p(x 2 ) = • • • = p(xj = 0. To show that VV is a subspace it is 
sufficient to show that the sum of two polynomials which vanish at the 
x t also vanishes at the x it and the product of a scalar and a polynomial 
vanishing at the x t also vanishes at the x f . What is the situation in P n if 
m> nl Similar subspaces can be defined in examples (3), (4), (5), (6), 
and (7). 



4 | Subspaces 21 

The space P m is a subspace of P, and also a subspace of P n for m <n. 

In R n , for each m, < m <n, the set of all <x = (a u a 2 , . . . , a n ) such 
that a x = a 2 = • • • = a m = is a subspace of R n . This subspace is proper 
if < m < n. 

Notice that the set of all n-tuples of rational numbers is a subset of R n 
and it is a vector space over the rational numbers, but it is not a subspace of 
R n since it is not a vector space over the real numbers. Why ? 

Theorem 4.1. The intersection of any collection of subspaces is a subspace. 

proof. Let W^-./biE M be an indexed collection of subspaces of V. 
n jUe yviW /i is not empty since it contains 0. Let a, /S g n^VV,, and a,beF. 
Then a, (3 e W^ for each ju e M. Since W M is a subspace aa + Z>/5 e W M for 
each [is hA, and hence aa + Z>/5 e n^^W^. Thus H^^W^ is a subspace. □ 

Let A be any subset of V, not necessarily a subspace. There exist subspaces 
W^ c: V which contain A; in fact, V is one of them. The intersection 
^aczvv//^ of a ll such subspaces is a subspace containing A. It is the 
smallest subspace containing A. 

Theorem 4.2. For any A c V, n Acvv ^W /< = <A>; ?/?a? w, ?Ae smallest 
subspace containing A is exactly the subspace spanned by A. 

proof. Since H^^W^ is a subspace containing A, it contains all linear 
combinations of elements of A. Thus (A) <= n AcW/4 W M . On the other 
hand (A) is a subspace containing A, that is, (A) is one of the W M and hence 
^w^ c <A>. Thus n AcWfl W^ = (A), n 

W x + W 2 is defined to be the set of all vectors of the form a x + <x 2 where 
a x G W x and a 2 6 W 2 . 

Theorem 4.3. IfW x and W 2 are subspaces ofV, then W x + W 2 is a subspace 
ofV. 

proof. If a = ai + a 2 e W x + W 2 , = & + £ 2 e W x + W 2 , and a, 
Z> e F, then act. + bfi = fl(a x + a 2 ) + 6(/?x + &) = (aaj + 6/^) + (aa 2 + 
Z>/5 2 ) g W x + W 2 . Thus W x + W 2 is a subspace. □ 

Theorem 4.4. W x + W 2 /s //?e smallest subspace containing both W x and 
W 2 ; f/zotf w, W x + VV 2 = (Wx U W 2 >. //"Ax spans W x and A 2 j/?a/!j W 2 , ^e« 
A x u A 2 spans W x + W 2 . 

proof. Since e W x , W 2 c W x + W 2 . Similarly, ^ c W x + VV 2 . 
Since W x + VV 2 is a subspace containing Wx U W 2 , (W x u W 2 > c W x + W 2 . 
For any a e W x + W 2 , a can be written in the form a = a x + a 2 where 
a x g W x and a 2 g VV 2 . Then a x G W x <= (Wx u W 2 > and a 2 g W 2 cz (Wx U 
W 2 >. Since (W 1 u W 2 > is a subspace, a = a x + a 2 e (Wx U W 2 >. Thus 
Wx + W 2 = (Wx u W 2 >. 



22 Vector Spaces | I 

The second part of the theorem now follows directly. W t = (A x ) <= 
(A x U Aj> and W 2 = (A 2 ) c (A x U A 2 ) so that W 1 u W 2 c (^ u A 2 > <= 
<W! u W 2 >, and hence <W X U W 2 > = <A t u A 2 ). a 

Theorem 4.5. A subspace W of an n-dimensional vector space V is a finite 
dimensional vector space of dimension m < n. 

proof. If YV = {0}, then W is 0-dimensional. Otherwise, there is a non- 
zero vector o^ e W. If (o^) = W, Wis 1-dimensional. Otherwise, there is an 
a 2 ^ (ax) in W. We continue in this fashion as long as possible. Suppose we 
have obtained the linearly independent set {a l9 . . . , cc k } and that it does not 
span W. Then there exists an a fc+1 e W, a fc+1 ^ (a l5 . . . , a fc ). In a linear 
relation of the form 2*ii a^ = we could not have a k+1 ^ for then 
a fc+i e <a l9 . . . , a t >. But then the relation reduces to the form JjLi «*<*» = 0. 
Since {a x , . . . , aj is linearly independent, all a* = 0. Thus {a x , . . . , a fc , a^+j} 
is linearly independent. In general, any linearly independent set in W that 
does not span W can be expanded into a larger linearly independent set in W. 
This process cannot go on indefinitely for in that event we would obtain more 
than n linearly independent vectors in V. Thus there exists an m such that 
<a 1} . . . , a TO > = W. It is clear that m < n. □ 

Theorem 4.6. Given any subspace W of dimension m in an n-dimensional 
vector space V, there exists a basis {a l5 . . . , a w , a TO+1 , . . . , a n } of V such that 
{a x , . . . , oi m } is a basis of W. 

proof. By the previous theorem we see that W has a basis {a x , . . . , a TO }. 
This set is also linearly independent when considered in V, and hence by 
Theorem 3.6 it can be extended to a basis of V. n 

Theorem 4.7. If two subspaces U and Wofa vector space V have the same 
finite dimension and U cz W, then U = W. 

proof. By the previous theorem there exists a basis of U which can be 
extended to a basis of W. But since dim U — dim W, the basis of W can 
have no more elements than does the basis of U. This means a basis of 
U is also a basis of W; that is, U = W. □ 

Theorem 4.8. If W x and W 2 are any two subspaces of a finite dimensional 
vector space V, then dim (W t + W 2 ) = dim W 1 + dim VV 2 - dim (W x n W 2 ). 

proof. Let {a x , . . . , a r } be a basis of W 1 n VV 2 . This basis can be 
extended to a basis {a x , . . . , a r , f} lt . . . , &} of W x and also to a basis 
{ai, . . . , a r , y x , . . . , y t } of VV 2 . It is clear that {<x x , . . . , <x r , lf . . . , /? s , y lt 
... ,y t } spans Wj + W 2 ; we wish to show that this set is linearly independent. 
Suppose 2* a t at t + 2* M; + 2* c fcn = is a linear relation. Then 

2< a &i + Z> Mi = — 2fc c k7k- The left side is m w i and tne ri 8 nt side is in 
VV 2 , and hence both are in W x n VV 2 . Each side is then expressible as a 
linear combination of the {<xj. Since any representation of an element as a 
linear combination of the {ai, . . . , a r , p it . . . , &} is unique, this means that 



4 | Subspaces 23 

bj = for ally. By a symmetric argument we see that all c k = 0. Finally, 
this means that J, a^ = from which it follows that all a t = 0. This 
shows that the spanning set {a l5 . . . , a r , ft, . . . , ft, y x , . . . , 7t } is linearly 
independent and a basis of W x + W 2 . Thus dim (Wj + VV 2 ) = r + 5 + / = 
(r + 5) + + - r = dim W x + dim W 2 - dim (W x n W 2 ). n 

As an example, consider in R 3 the subspaces W x = <(1, 0, 2), (1, 2, 2)) 
and W 2 = <(1, 1, 0), (0, 1, 1)). Both subspaces are of dimension 2. Since 
Wi <= W x + VV 2 cz R3 we see t h a t 2 < dim (W x + W 2 ) < 3. Because of 
Theorem 4.8 this implies that 1 < dim (W x n W 2 ) < 2. In more familiar 
terms, W x and W 2 are planes in a 3-dimensional space. Since both planes 
contain the origin, they do intersect. Their intersection is either a line or, 
in case they coincide, a plane. The first problem is to find a basis for W x n W 2 . 
Any a.eW 1 nW 2 must be expressible in the forms a = a(l, 0, 2) + 
6(1, 2, 2) = c(l, 1, 0) + </(0, 1, 1). This leads to the three equations: 

a + 6 = c 

2b = c + d 
2a + 2b= d. 

These equations have the solutions b = —3a, c = —2a, d = —4a. Thus 
a = a(l, 0, 2) - 3a(l, 2, 2) = a(-2, -6, -4). As a check we also have 
a = -2a(l, 1,0)- 4a(0, 1,1) = a(-2, -6, -4). We have determined 
that {(1, 3, 2)} is a basis of W t n W 2 . Also {(1, 3, 2), (1, 0, 2)} is a basis 
of W x and {(1, 3, 2), (1,1, 0)} is a basis of W 2 . 

We are all familiar with the theorem from solid geometry to the effect 
that two non-parallel planes intersect in a line, and the example above is an 
illustration of that theorem. In spaces of dimension higher than 3, how- 
ever, it is possible for two subspaces of dimension 2 to have but one point 
in common. For example, in R 4 the subspaces W x = <(1 , 0, 0, 0), (0, 1 , 0, 0)> 
and W 2 = <(0, 0, 1,0), (0,0,0,1)) are each 2-dimensional and W x n 
W 2 = {0}, W, + W 2 = R*. 

Those cases in which dim (W x n VV 2 ) = deserve special mention. If 
W x nW 2 = {0} we say that the sum W x + VV 2 is direct: W x + W 2 is a 
direct sum of W x and W 2 . To indicate that a sum is direct we use the notation, 
W x e W 2 . For a e W x W 2 there exist a x e W x and a 2 e W 2 such that 
a = a x + a 2 . This much is true for any sum of two subspaces. If the sum is 
direct, however, <x x and <x 2 are uniquely determined by a. For if a = a x + a 2 = 
^ + a;, then a t - a( = a 2 - a 2 . Since the left side is in W x and the 
right side is in VV 2 , both are in W x n W 2 . But this means ol x — <x. x = and 
oc 2 — <x 2 = 0; that is, the decomposition of a into a sum of an element in W x 
plus an element in W 2 is unique. If V is the direct sum of W x and VV 2 , we say 
that W x and VV 2 are complementary and that VV 2 is a complementary subspace 
of W l5 or a complement of W^. 



24 Vector Spaces | I 

The notion of a direct sum can be extended to a sum of any finite number 
of subspaces. The sum W x + • • • + W k is said to be direct if for each i, 
W t n Q^i Wj) = {0}. If the sum of several subspaces is direct, we use the 
notation W x © W 2 © • • • © W k . In this case, too, a e W x © • • ■ © W k 
can be expressed uniquely in the form a = ^ a i5 a f e W^. 

Theorem 4.9. If W is a subspace of V there exists a subspace W such that 

V =w@w. 

proof. Let {a l5 . . . , a m } be a basis of W. Extend this linearly inde- 
pendent set to a basis {a 1} . . . , a m , a TO+1 , . . . , a n } of V. Let W be the sub- 
space spanned by {<x TO+1 , . . . , <x„}. Clearly, W r\ W = {0} and the sum 

V = W + W is direct. D 

Thus every subspace of a finite dimensional vector space has a comple- 
mentary subspace. The complement is not unique, however. If for W there 
exists a subspace W such that V = W © W, we say that W is a direct 
summand of V. 

Theorem 4.10. For a sum of several subspaces of a finite dimensional 
vector space to be direct it is necessary and sufficient that dim (W x + • • • -f 
W k ) = dim W x + • • • + dim W k . 

proof. This is an immediate consequence of Theorem 4.8 and the prin- 
ciple of mathematical induction. □ 

EXERCISES 

1 . Let P be the space of all polynomials with real coefficients. Determine which 
of the following subsets of P are subspaces. 

(a){p(x)\p(l)=0}. 

(b) {p{x) | constant term ofp(x) = 0}. 

(c) {p(x) | degree ofp(x) = 3}. 
{d) {p{x) | degree ofp(x) < 3}. 

(Strictly speaking, the zero polynomial does not have a degree associated with it. 
It is sometimes convenient to agree that the zero polynomial has degree less than 
any integer, positive or negative. With this convention the zero polynomial is 
included in the set described above, and it is not necessary to add a separate 
comment to include it.) 

(e) {p(x) | degree of p{x) is even} u {0}. 

2. Determine which of the following subsets of R n are subspaces. 

(a) {(«!, x 2 , . . . , x n ) | x 1 = 0}. 

(b) {(x lt x 2 ,..., x n ) | x x > 0}. 

(c) {(x x , x 2 , . . . , x n ) I x x + 2x 2 = 0}. 

(d) {(»!, x 2 , . . . , x n ) | x x + 2x 2 = 1}. 

00 {(«i, *2, • • • , »«) I x l + 2x 2 ^ °}- 

(/) {(^ij x 2> ■ ■ ■ > x n) \ ™i < x i < Mf. i = 1, 2, . . . , n where the m^ and M t - 
are constants}. 

(g) {( x i, x z, • • • , x n) I *i = x 2 = ■ • • = ^n}- 



4 | Subspaces 25 

3. What is the essential difference between the condition used to define the 
subset in (c) of Exercise 2 and the condition used in (d) ? Is the lack of a non-zero 
constant term important in (c) ? 

4. What is the essential difference between the condition used to define the 
subset in (c) of Exercise 2 and the condition used in (e)7 What, in general, are the 
differences between the conditions in (a), (c), and (g) and those in (b), (e), and (/)? 

5. Show that {(1, 1,0, 0), (1, 0, 1, 1)} and {(2, -1, 3, 3), (0, 1, -1, -1)} span 
the same subspace of R 4 . 

6. Let W be the subspace of R 5 spanned by {(1,1,1,1,1), (1,0,1,0,1), 
(0,1,1,1,0), (2,0,0,1,1), (2,1,1,2,1), (1, -1, -1, -2,2), (1,2,3,4, -1)}. 
Find a basis for W and the dimension of W. 

7. Show that {(1, -1,2, -3), (1, 1, 2, 0), (3, -1,6, -6)} and {(1,0,1,0), 
(0, 2, 0, 3)} do not span the same subspace. 

8. Let W = <(l, 2, 3, 6), (4, -1,3, 6), (5, 1, 6, 12)> and W 2 = <(1, -1, 1, 1), 
(2, -1,4, 5)> be subspaces of R 4 . Find bases for W x n W 2 and W x + W 2 . Extend 
the basis of W x n W 2 to a basis of W lt and extend the basis of W x n VV 2 to a basis 
of W 2 . From these bases obtain a basis of W x + W 2 . 

9. Let P be the space of all polynomials with real coefficients, and let W ± = 
(p(x) \p(l) = 0} and W 2 = { p (x) \p(l) = 0}. Determine W 1 n W 2 and W x + W 2 . 
(These spaces are infinite dimensional and the student is not expected to find 
bases for these subspaces. What is expected is a simple criterion or description 
of these subspaces.) 

10. We have already seen (Section 1, Exercise 11) that the real numbers form 
a vector space over the rationals. Show that {1, V2} and {1 - V2, 1 + V2} 
span the same subspace. 

11. Show that if W x and W 2 are subspaces, then W x u W 2 is not a subspace 
unless one is a subspace of the other. 

12. Show that the set of all vectors (x lt x 2 , x 3 , x 4 ) e R 4 satisfying the equations 

3x x — 2x 2 — x 3 — 4x 4 = 

is a subspace of R 4 . Find a basis for this subspace. {Hint: Solve the equations for 
x x and x 2 in terms of x s and a; 4 . Then specify various values for x z and x 4 to obtain 
as many linearly independent vectors as are needed.) 

13. Let S, T, and T* be three subspaces of V (of finite dimension) for which 
(a)S n T = S nT*,(b)S + T = S +T*, (c) T c T*. Show that T = T*. 

14. Show by example that it is possible to have S ®T = S ®T* without having 
T = T*. 

15. If V = Wj e W 2 and W is any subspace of V such that W x c W, show that 
W = (W nWj) (W n VV 2 ). Show by an example that the condition W x <= W 
(or W 2 <= VV) is necessary. 



chapter 



II 



Linear 

transformations 
and matrices 



In this chapter we define linear transformations and various operations: 
addition of two linear transformations, multiplication of two linear trans- 
formations, and multiplication of a linear transformation by a scalar. 
Linear transformations are functions of vectors in one vector space U 
with values which are vectors in the same or another vector space V which 
preserve linear combinations. They can be represented by matrices in the 
same sense that vectors can be represented by w-tuples. This representation 
requires that operations of addition, multiplication, and scalar multiplication 
of matrices be defined to correspond to these operations with linear trans- 
formations. Thus we establish an algebra of matrices by means of the 
conceptually simpler algebra of linear transformations. 

The matrix representing a linear transformation of U into V depends on 
the choice of a basis in U and a basis in V. Our first problem, a recurrent 
problem whenever matrices are used to represent anything, is to see how a 
change in the choice of bases determines a corresponding change in the 
matrix representing the linear transformation. Two matrices which represent 
the same linear transformation with respect to different sets of bases must 
have some properties in common. This leads to the idea of equivalence 
relations among matrices. The exact nature of this equivalence relation 
depends on the bases which are permitted. 

In this chapter no restriction is placed on the bases which are permitted 
and we obtain the widest kind of equivalence. In Chapter III we identify 
U and V and require that the same basis be used in both. This yields a 
more restricted kind of equivalence, and a study of this equivalence is both 
interesting and fruitful. In Chapter V we make further restrictions in the 
permissible bases and obtain an even more restricted equivalence. 

When no restriction is placed on the bases which are permitted, the 

26 



1 I Linear Transformations 27 

equivalence is so broad that it is relatively uninteresting. Very useful results 
are obtained, however, when we are permitted to change basis only in the 
image space V. In every set of mutually equivalent matrices we select one, 
representative of all of them, which we call a normal form, in this case 
the Hermite normal form. The Hermite normal form is one of our most 
important and effective computational tools, far exceeding in utility its 
application to the study of this particular equivalence relation. 

The pattern we have described is worth conscious notice since it is re- 
current and the principal underlying theme in this exposition of matrix 
theory. We define a concept, find a representation suitable for effective 
computation, change bases to see how this change affects the representation, 
and then seek a normal form in each class of equivalent representations. 



1 I Linear Transformations 

Let U and V be vector spaces over the same field of scalars F. 

Definition. A linear transformation a of U into V is a single-valued mapping 
of U into V which associates to each element a e U a unique element <r(a) e V 
such that for all a, e U and all a, b e F we have 

o(a<x + bp) = aa(<x) + bo(P). (1.1) 

We call cr(a) the image of a under the linear transformation a. If a e V, 
then any vector <xeU such that <r(a) = a is called an inverse image of a. 
The set of all a e U such that <r(<x) = a is called the complete inverse image 
of a, and it is denoted by o r_1 (a). Generally, <r -1 (a) need not be a single 
element as there may be more than one aeli such that c(a) = a. 

By taking particular choices for a and b we see that for a linear trans- 
formation cr(a + |8) = ff(a) + <r(/8) and (r(aa) = a(r(a). Loosely speaking, 
the image of the sum is the sum of the images and the image of the product 
is the product of the images. This descriptive language has to be interpreted 
generously since the operations before and after applying the linear trans- 
formation may take place in different vector spaces. Furthermore, the remark 
about scalar multiplication is inexact since we do not apply the linear trans- 
formation to scalars; the linear transformation is defined only for vectors 
in U. Even so, the linear transformation does preserve the structural 
operations in a vector space and this is the reason for its importance. Gener- 
ally, in algebra a structure-preserving mapping is called a homomorphism. 
To describe the special role of the elements of F in the condition, a(aa.) = 
ao(<x), we say that a linear transformation is a homomorphism over F, or an 
F-homomorphism . 

If for a 5^ j8 it necessarily follows that <r(a) ^ o(ji), the homomorphism a 
is said to be one-to-one and it is called a monomorphism. If A is any subset of 



28 Linear Transformations and Matrices | II 

U, a(A) will denote the set of all images of elements of A; a{A) = {a | a = 
<r(a) for some a e A}. o(A) is called the image of A. a(U) is often denoted by 
Im((r) and is called the image of a. If Im(or) = V we shall say that the homo- 
morphism is a mapping onto V and it is called an epimorphism. 

We call the set U, on which the linear transformation a is defined, the 
domain of a. We call V, the set in which the images of a are defined, the 
codomain of a. Strictly speaking, a linear transformation must specify 
the domain and codomain as well as the mapping. For example, consider 
the linear transformation that maps every vector of U onto the zero vector of 
V. This mapping is called the zero mapping. If W is any subspace of V, there is 
also a zero mapping of U into W, and this mapping has the same effect on the 
elements of U as the zero mapping of U into V. However, they are different 
linear transformations since they have different codomains. This may seem 
like an unnecessarily fine distinction. Actually, for most of this book we 
could get along without this degree of precision. But the more deeply we go 
into linear algebra the more such precision is needed. In this book we need 
this much care when we discuss dual spaces and dual transformations in 
Chapter IV. 

A homomorphism that is both an epimorphism and a monomorphism is 

called an isomorphism. If a e V, the fact that a is an epimorphism says that 

There is an a e U such that a(<x) = a. The fact that a is a monomorphism says 

that this a is unique. Thus, for an isomorphism, we can define an inverse 

mapping cr _1 that maps a onto a. 

Theorem 1.1. The inverse cr -1 of an isomorphism is also an isomorphism. 

proof. Since a' 1 is obviously one-to-one and onto, it is necessary only 
to show that it is linear. If a = <r(a) and /?= <r(/3), then a(aa. + bp) = 
a* + bp so that a-\adL + bfi) = a<x + bp = acr\a) + ba- 1 ^). □ 

For the inverse isomorphism cr _1 (a) is an element of U. This conflicts with 
the previously given definition of ff _1 (a) as a complete inverse image in which 
o-\dL) is a subset of U. However, the symbol cr -1 , standing alone, will 
always be used to denote an isomorphism, and in this case there is no diffi- 
culty caused by the fact that <r _1 (a) might denote either an element or a one- 
element set, 

Let us give some examples of linear transformations. Let U = V = P, 
the space of polynomials in x with coefficients in R. For a = 27=0°^' 

define cr(a) = — = 2 n =o "**?■ " ln calculus one of tne very first things 
dx l % ' d(* + 0) da dp J 

proved about the derivative is that it is linear, = — + — and 

j/ \ j dx dx dx 

— — = a — . The mapping t(<x) = ^Lo - — ^7 x * +1 is also linear - Notice 
dx dx i + 1 

that this is not the indefinite integral since we have specified that the constant 



1 I Linear Transformations 29 

of integration shall be zero. Notice that a is onto but not one-to-one and t 
is one-to-one but not onto. 

Let U = R n and V = R m with m < n. For each a = (a lt . . . , a n ) e R n 
define a(a) = (a l5 . . . , a TO ) e R m . It is clear that this linear transformation 
is one-to-one if and only if m = n, but it is onto. For each /? = {b x , . . . , 
bj g R m define t(j8) = fo, . . . , 6 m , 0, . . . , 0) e R n . This linear transforma- 
tion is one-to-one, but it is onto if and only if m = n. 

Let U = V. For a given scalar ae F the mapping of a onto a a is linear since 

fl(a + 0) = aa + a£ = a(a) + a(0), 
and 

«(&<x) = (a6)a = (ba)<x. = b ■ a(cn). 

To simplify notation we also denote this linear-transformation by a. Linear 
transformations of this type are called scalar transformations, and there is a 
one-to-one correspondence between the field of scalars and the set of scalar 
transformations. In particular, the linear transformation that leaves every 
vector fixed is denoted by 1. It is called the identity transformation or unit 
transformation. If linear transformations in several vector spaces are being 
discussed at the same time, it may be desirable to identify the space on which 
the identity transformation is defined. Thus ly will denote the identity 
transformation on U. 

When a basis of a finite dimensional vector space V is used to establish a 
correspondence between vectors in V and «-tuples in F n , this correspondence 
is an isomorphism. The required arguments have already been given in 
Section 1-3. Since V and F n are isomorphic, it is theoretically possible to 
discuss the properties of V by examining the properties of F n . However, there 
is much interest and importance attached to concepts that are independent 
of the choice of a basis. If a homomorphism or isomorphism can be defined 
uniquely by intrinsic properties independent of a choice of basis the mapping is 
said to be natural or canonical. In particular, any two vector spaces of 
dimension n over F are isomorphic. Such an isomorphism can be established 
by setting up an isomorphism between each one and F n . This isomorphism 
will be dependent on a choice of a basis in each space. Such an isomorphism, 
dependent upon the arbitrary choice of bases, is not canonical. 

Next, let us define the various operations between linear transformations. 
For each pair a, r of linear transformation of U into V, define a + t by the 
rule 

(a + t)(<x) = <r(a) + t(oc) for all ae^ 

a + t is a linear transformation since 

(a + r)(aa + bfi) = o(a«. + bfi) + r(aa + bp) = aa{a) + M0) 

+ aT(a) + M0) = a[o(a) + r(a)] + b{a{fi) + r(/3)] 
= a(a + t)(oc) + b(a + t)(jS). 



30 Linear Transformations and Matrices | II 

Observe that addition of linear transformation is commutative ; a + r = 
r + a. 

For each linear transformation a and a e F define aa by the rule ; (aa)(ct) = 
a[a(<x.)]. aa is a linear transformation. 

It is not difficult to show that with these two operations the set of all linear 
transformations of U into V is itself a vector space over F. This is a very 
important fact and we occasionally refer to it and make use of it. However, 
we wish to emphasize that we define the sum of two linear transformations 
if and only if they both have the same domain and the same codomain. 
It is neither necessary nor sufficient that they have the same image, or that the 
image of one be a subset of the image of the other. It is simply a question of 
being clear about the terminology and its meaning. The set of all linear 
transformations of U into V will be denoted by Hom(U, V). 

There is another, entirely new, operation that we need to define. Let W 
be a third vector space over F. Let a be a linear transformation of U into V 
and t a linear transformation of V into W. By ra we denote the linear trans- 
formation of U into W defined by the rule: (to-)(oc) = r[a(oC)]. Notice that 
in this context ar has no meaning. We refer to this operation as either 
iteration or multiplication of linear transformation, and ra is called the 
product of t and a. 

The operations between linear transformations are related by the following 
rules : 

1. Multiplication is associative: tt{to) = (rrr)a. Here n is a linear 
transformation of W into a fourth vector space X. 

2. Multiplication is distributive with respect to addition: 

(ti + t 2 )o = r x a + r 2 a and r(a 1 + a 2 ) = ra x + tg 2 . 

3. Scalar multiplication commutes with multiplication: a(TG) = r(aa). 
These properties are easily proved and are left to the reader. 

Notice that if W # U, then to is defined but or is not. If all linear trans- 
formations under consideration are mappings of a vector space U into itself, 
then these linear transformations can be multiplied in any order. This means 
that ra and ar would both be defined, but it would not mean that ra = ar. 

The set of linear transformation of a vector space into itself is a vector 
space, as we have already observed, and now we have defined a product 
which satisfies the three conditions given above. Such a space is called an 
associative algebra. In our case the algebra consists of linear transformation 
and it is known as a linear algebra. However, the use of terms is always in 
a state of flux, and today this term is used in a more inclusive sense. When 
referring to a particular set with an algebraic structure, "linear algebra" 
still denotes what we have just described. But when referring to an area of 



1 I Linear Transformations 31 

study, the term "linear algebra includes virtually every concept in which 
linear transformations play a role, including linear transformations between 
different vector spaces (in which the linear transformations cannot always 
be multiplied), sequences of vector spaces, and even mappings of sets of 
linear transformations (since they also have the structure of a vector space). 

Theorem 1.2. Im(o-) is a subspace of V. 

proof. If a and j5 are elements of Im(cr), there exist a, e U such that 
<r(a) = a and c(p) = /5. For any a, beF, a(a<x + b§) = aa(<x.) + ba(fi) = 
av. + bfi g Im(cr). Thus Im(cr) is a subspace of V. D 

Corollary 1.3. If L/ x is a subspace of U, then (ji^d is a subspace of V. □ 

It follows from this corollary that <r(0) = where denotes the zero vector 
of U and the zero vector of V. It is even easier, however, to show it directly. 
Since o"(0) = <r(0 + 0) = tf(0) + <r(0) it follows from the uniqueness of the 
zero vector that cr(0) = 0. 

For the rest of this book, unless specific comment is made, we assume 
that all vector spaces under consideration are finite dimensional. Let 
dim U = n and dim V = m. 

The dimension of the subspace Im(cr) is called the rank of the linear trans- 
formation a. The rank of a is denoted by p(ff). 

Theorem 1.4. p($ < {m, n}. 

proof. If {<*!, . . . , <x s } is linearly dependent in U, there exists a non-trivial 
relation of the form 2< a^ = 0. But then 2i «i^( a ») = ff (°) = 0; that is, 
{ofay), . . . , cr(a s )} is linearly dependent in V. A linear transformation 
preserves lin ear relations and transforms dependent sets into dependent 
sets. Thus, there can be no more than n linearly independent elements in 
Im((r). In addition, Im(<r) is a subspace of V so that dim lm(a) < m. Thus 
p(o) = dim Im(o-) < min {m, n). □ 

Theorem 1.5. If W is a subspace of V, the set <r\W) of all a e U such that 
(r(a) £ W is a subspace of U. 

proof. If a, 6 (f^W), then a(ax + bfS) = ao{a) + botf) e W. Thus 
au. + bp e <r -1 (W) and tr^W) is a subspace. □ 

The subspace K(a) = a _1 (0) is called the kernel of the linear transformation 
a. The dimension of K{a) is called the nullity of a. The nullity of a is denoted 
by v(a). 

Theorem 1.6. p(<r) + v(<r) = n. 

proof. Let {a l5 . . . , a v , &, . . . , p k ) be a basis of U such that {a l9 . . . , a v } 
is a basis of K(a). For a = 2* a^ + 2,- bfi s e U we see that <r(a) = 
2, a*tf(oO + L- MA) = 2* W- Thus Wi), ■••-, *(&)} spans Im(cr). 
On the other hand if % c,<r(ft) = 0, then cr(2, c,&) = 2,- c/Kft) = 0; that 



32 Linear Transformations and Matrices | II 

is, 2* c Si e K ( a )- In tnis case tnere exist coefficients d { such that 2; c>& = 
2, di&i- If any of these coefficients were non-zero we would have a non- 
trivial relation among the elements of {a l5 . . . , a,, ^ l5 . . . , /?*.}. Hence, all 
c f = and M/^), . . . , cr(0 fc )} is linearly independent. But then it is a basis 
of Im(<r) so that k = p(a). Thus p{a) + r(a) = n. U 

Theorem 1.6 has an important geometric interpretation. Suppose that 
a 3-dimensional vector space R 3 were mapped onto a 2-dimensional vector 
space R 2 . In this case, it is simplest and sufficiently accurate to think of a 
as the linear transformation which maps (a u a 2 , a 3 ) e R 3 onto (a lt a 2 ) e R 2 
which we can identify with (a lt a 2 , 0) e R 3 . Since p(a) = 2, i>(V) = 1. 
Clearly, every point (0, 0, « 3 ) on the :r 3 -axis is mapped onto the origin. 
Thus K{a) is the z 3 -axis, the line through the origin in the direction of the 
projection, and {(0, 0, 1) = aj is a basis of K(a). It should be evident 
that any plane through the origin not containing K{o) will be projected onto 
the a^-plane and that this mapping is one-to-one and onto. Thus the com- 
plementary subspace <&, /S 2 > can be taken to be any plane through the origin 
not containing the rr 3 -axis. This illustrates the wide latitude of choice possible 
for the complementary subspace </? l5 . . . , P p ). 

Theorem 1.7. A linear transformation a of U into V is a monomorphism if 
and only ifv{a) = 0, and it is an epimorphism if and only if p(a) = dim V. 

proof. K(a) = {0} if and only if v(o) = 0. If a is a monomorphism, then 
certainly K{o) = {0} and v{a) = 0. On the other hand, if v(a) = and 
<r(a) = ff(j8), then <r(a - j8) = so that a - e K(o) = {0}. Thus, if 
v(a) = 0, a is a monomorphism. 

It is but a matter of reading the definitions to see that o is an epimorphism 
if and only if p(a) = dim V. □ 

If dim U = n < dim V = m, then p(a) = n — v(a) < n < m so that 
a cannot be an epimorphism. If« > m,thenv(a) = n — p(a) >n — m> 0, 
so that a cannot be a monomorphism. Any linear transformation from a 
vector space into a vector space of higher dimension must fail to be an 
epimorphism. Any linear transformation from a vector space into a vector 
space of lower dimension must fail to be a monomorphism. 

Theorem 1.8. Let U and V have the same finite dimension n. A linear 
transformation a ofil into V is an isomorphism if and only if it is an epimorphism. 
a is an isomorphism if and only if it is a monomorphism. 

proof. It is part of the definition of an isomorphism that it is both an 
epimorphism and a monomorphism. Suppose a is an epimorphism. p(a) = 
n and v(o) = by Theorem 1.6. Hence, a is a monomorphism. Conversely 
if a is a monomorphism, then v(a) = and, by Theorem 1.6, p(o) = n. 
Hence, a is an epimorphism. □ 



1 I Linear Transformations 33 

Thus a linear transformation a of U into V is an isomorphism if two of the 
following three conditions are satisfied: (1) dim U = dim V, (2) a is an 
epimorphism, (3) a is a monomorphism. 

Theorem 1.9. p(r) = p(ro) + dim (Im(<7) n A"(r)}. 

proof. Let t' be a new linear transformation defined on lm(a) mapping 
ImO) into W so that for all a e Im(o-), t'(oc) = T (a). Then AT(t') = Im(or) n 
^(t) and />(Y) = dim t [Im(»] = dim r<r(L/) = p{ra). Then Theorem 1.6 
takes the form 

p(r') + v(t') *= dim Im(o-), 

or 

p(r<r) + dim (Im(<r) n #(t)} = p{a). D 

Corollary 1.10. p(ra) = dim (Im(o-) + K(r)} - v(t). 
proof. This follows from Theorem 1.9 by application of Theorem 4.8 
of Chapter I. □ 

Corollary 1.11. IfK(r) <= Im(cr), then p(a) = p(ra) + v(r). □ 

Theorem 1.12. The rank of a product of linear transformations is less than 
or equal to the rank of either factor : p{ra) < min {p(r), p{a)}. 

proof. The rank of to is the dimension of r[a(U)] c T (V). Thus consider- 
ing dim <r(U) as the ' V and dim t(V) as the "m" of Theorem 1.3 we see that 
dim ra(U) = p{ra) < min {dim or(V), dim t (V)} = min {p{a), p(r)}. D 

Theorem 1.13. If a is an epimorphism, then p(ja) = p(r). If r is a mono- 
morphism, then p{ra) = p(a). 

proof. If a is an epimorphism, then K(t) <= Im(o-) = V and Corollary 
1.11 applies. Thus p(ra) = p{a) — v(t) = m — v(t) = p(r). If t is a 
monomorphism, then K(r) = {0} <= Im(o-) and Corollary 1.11 applies. 
Thus p(ra) = p(a) - v(t) = p(a). D 

Corollary 1.14. The rank of a linear transformation is not changed by 
multiplication by an isomorphism {on either side). □ 

Theorem 1.15. a is an epimorphism if and only if to = implies r — 0. 
t i5 a monomorphism if and only ifra = implies a = 0. 

proof. Suppose a is an epimorphism. Assume to is defined and ra = 0. 
If t 5^ 0, there is a e V such that t(|8) 5^ 0. Since cr is an epimorphism, 
there is an a e U such that <r(a) = 0. Then T<r(a) = t(^) 5^ 0. This is a 
contradiction and hence t = 0. Now, suppose tot = implies t = 0. If or is 
not an epimorphism then Im(cr) is a subspace of V but Im(a) ^ V. Let 
{/?!, . . . , &.} be a basis of Im(o-), and extend this independent set to a basis 
{ft, . . . , 0„ . . . , PJ of V. Define r^) = ft for » > r and r(ft) = for 



34 Linear Transformations and Matrices | II 

/ <, r. Then to = and t^O. This is a contradiction and, hence, a is an 
epimorphism. 

Now, assume ra is defined and ra = 0. Suppose r is a monomorphism. 
If a 5^ 0, there is an a e U such that <r(a) # 0. Since t is a monomorphism, 
T<r(a) 5^ 0. This is a contradiction and, hence, a = 0. Now assume rer = 
implies <r = 0. If r is not a monomorphism there is an a e L/ such that a^0 
and t(<x) = 0. Let {vl x , . . . , a„} be any basis of U. Define ^(a,) = a for each 
i. Then To'(a l ) = r(a) = for all / and ra = 0. This is a contradiction and, 
hence, r is a monomorphism. □ 

Corollary 1.16. a is an epimorphism if and only if r x a = r 2 a implies 
T i = T 2- T w a monomorphism if and only if ra x = ra 2 implies a x = a 2 . 

The statement that r x a = r 2 a implies t x = r 2 is called a right-cancellation, 
and the statement that ra x = ro 2 implies a x = cr 2 is called a left-cancellation. 
Thus, an epimorphism is a linear transformation that can be cancelled on the 
right, and a monomorphism is a linear transformation that can be cancelled 
on the left. 

Theorem 1.17. Let A = {<x l5 . . . , a J be any basis of U. Let 6 = {/5j, . . . , 
/3J be any n vectors in V (not necessarily linearly independent). There exists 
a uniquely determined linear transformation a of U into V such that o^a,) = /9 t 
for i= 1,2, ... ,n. 

proof. Since A is a basis of U, any vector ccg U can be expressed uniquely 
in the form a = ^" =1 <*&■ If <* is to be linear we must have 

n n 

■■'£ *,** ;!-- (7(a) = ^ *<<<«)< = 2 «ift e U. 

It is a simple matter to verify that the mapping so defined is linear. □ 

Corollary 1.18. Let C = {y lt . . . , y r ) be any linearly independent set in U, 
where U is finite dimensional. Let D — {d x , . . . , d r ) be any r vectors in V. 
There exists a linear transformation a of U into V such that ^(y^) = d t for 
i = 1 , . . . , r. 

proof. Extend C to a basis of U. Define a(y t ) = d t for i = 1, . . . , r, 
and define the values of a on the other elements of the basis arbitrarily. This 
will yield a linear transformation a with the desired properties. □ 

It should be clear that, if C is not already a basis, there are many ways to 
define a. It is worth pointing out that the independence of the set C is crucial 
to proving the existence of the linear transformation with the desired prop- 
erties. Otherwise, a linear relation among the elements of C would impose 
a corresponding linear relation among the elements of D, which would mean 
that D could not be arbitrary. 



1 I Linear Transformations 35 

Theorem 1.17 establishes, for one thing, that linear transformations really 
do exist. Moreover, they exist in abundance. The real utility of this theorem 
and its corollary is that it enables us to establish the existence of a linear 
transformation with some desirable property with great convenience. All 
we have to do is to define this function on an independent set. 

Definition. A linear transformation tt of V into itself with the property that 
7T- 2 = 7T is called a. projection. 

Theorem 1.19. If tt is a projection oj " V into itself, then V = lm(rr) © K(tt) 
and rr acts like the identity on Im(7r). 

proof. For a G V, let o^ = 7r(a). Then 7r(ai) = 7r 2 (a) = 7r(a) = <*j. This 
shows that tt acts like the identity on IrmV). Let <x 2 = a — a x . Then 7r(a 2 ) = 
7r(a) — Trfai) = a x — a x = 0. Thus a = a x + ol 2 where x 1 e Im(7r) and 
<x 2 g K(tt). Clearly, Im(7r) n K(tt) = {0}. □ 




Fig. 1 

If S =' Im(7r) and T = K(tt), we say that tt is a projection of V onto S along 
T. In the case where V is the real plane, Fig. 1 indicates the interpretation of 
these words, a is projected onto a point of S in a direction parallel to 7*. 

EXERCISES 

1. Show that o((x 1 ,x 2 )) = (x 2 , x x ) defines a linear transformation of R 2 into 
itself. 

2. Let #!((#!, # 2 )) = 0e 2 , — x x ) and ^((^i* ^2)) = (^i> ~ x 2)- Determine a x + a 2 , 
G\G 2 and (TgfTj. 

3. Let U = V = R™ and let a-^, a; 2 » • • • » x »)) = (^1. « 2 > • • • > x fc> 0, • • • , 0) 
where k < n. Describe Im(cr) and A^(ct). 

4. Let o((x lt x 2> %> %)) = Q x i — 2cc 2 — x 3 — 4x 4 , x x + x 2 — 2x 3 — 3%). Show 
that a is a linear transformation. Determine the kernel of a. 

5. Let o((x lt x 2 , x 3 )) = (2^ + x 2 + 3x 3 , 3x x — x 2 + x 3 , —\x x + 3x 2 + x 3 ). Find 



36 Linear Transformations and Matrices | II 

a basis of o(U). (Hint: Take particular values of the x t to find a spanning set for 
or(U).) Find a basis of K(o). 

6. Let D denote the operator of differentiation, 

dy d 2 v 

D(y) = -j x , D\y) = D[D(y)] = ^ , etc. 

Show that D n is a linear transformation, and also that p(D) is a linear transforma- 
tion if p(D) is a polynomial in D with constant coefficients. (Here we must assume 
that the space of functions on which D is denned contains only functions differen- 
tiable at least as often as the degree of p(D).) 

7. Let U = V and let o and t be linear transformations of U into itself. In this 
case gt and to are both defined. Construct an example to show that it is not 
always true that or = to. 

8. Let U = V = P, the space of polynomials in x with coefficients in R. For 

" = 2,7=0 ^ let 

n 
tfC"*) = 2 * a i xil 

and 

i=0 * "I" 1 

Show that <tt = 1, but that to ^ 1. 

9. Show that if two scalar transformations coincide on U then the defining 
scalars are equal. 

10. Let o be a linear transformation of U into V and let A = {a x , . . . , a„} be a 
basis of U. Show that if the values {0(0.-^}, . . . , o(a. n )} are known, then the value 
of <r(a) can be computed for each a e U. 

11. Let 1/ and V be vector spaces of dimensions n and w, respectively, over the 
same field F. We have already commented that the set of all linear transformations 
of U into V forms a vector space. Give the details of the proof of this assertion. 
Let A = {04, . . . , a n } be a basis of U and 8 = {/? x , . . . , /? TO } be a basis of V. Let o^ 
be the linear transformation of U into V such that 

[0 if k *j, 

[& if k =j. 

Show that {tr j; I / — 1, . . . , m;j = 1, . . . , n} is a basis of this vector space. 

For the following sequence of problems let dim U = n and dim V — m. Let o 
be a linear transformation of U into V and t a linear transformation of V into W. 

12. Show that p(a) < P ( r o) + i>(t). (///«/: Let V = o(U) and apply Theorem 
1.6 to t defined on V.) 

13. Show that max {0, p(o) + p(-r) — m} < p(to) < min (p(t), p(<r)}. 



2 I Matrices 



37 



14. Show that max {n — m + v(t), v(o)} <, v(to) <, min {n, v(o) + v( T )}. (For 
m = n this inequality is known as Sylvester's law of nullity.) 

15. Show that if j>(t) = 0, then p (to) = P (a). 

16. It is not generally true that v(a) = implies P (ra) = P ( T ). Construct an 
example to illustrate this fact. (Hint: Let m be very large.) 

17. Show that if m = n and v(a) = 0, then p (to) = p (t). 

18. Show that if a x and <r 2 are linear transformations of U into V, then 

P (a 1 + a 2 ) < min {m, n, pK) + P (a z )}. 

19. Show that |/>Oi) - p(ff 2 )l ^ p(°i + ff 2)- 

20. If S is any subspace of V there is a subspace T such that V = S © T. Then 
every a; e V can be represented uniquely in the form a = a x + a 2 where a x e S and 
a 2 G T. Show that the mapping rr which maps a onto a x is a linear transformation. 
Show that T is the kernel of v. Show that tt 2 = w. The mapping tt is called a 
projection of V onto S along 7. 

21. (Continuation) Let n be a projection. Show that 1 — tt is also a projection. 
What is the kernel of 1 — -nl Onto what subspace is 1 — tt a projection? Show 
that n(l - n) =0. 



2 I Matrices 

Definition. A matrix over a field F is a rectangular array of scalars. The 
array will be written in the form 



02i a 9 



(2.1) 



whenever we wish to display all the elements in the array or show the form 
of the array. A matrix with m rows and n columns is called an m x n 
matrix. Ann x n matrix is said to be of order n. 

We often abbreviate a matrix written in the form above to [a ti ] where 
the first index denotes the number of the row and the second index denotes 
the number of the column. The particular letter appearing in each index 
position is immaterial; it is the position that is important. With this con- 
vention a H is a scalar and [a i} ] is a matrix. Whereas the elements a H and a kl 
need not be equal, we consider the matrices [a i} ] and [a kl ] to be identical 
since both [a i} ] and [a kl ] stand for the entire matrix. As a further convenience 
we often use upper case Latin italic letters to denote matrices; A = [a u ]. 
Whenever we use lower case Latin italic letters to denote the scalars appearing 



38 Linear Transformations and Matrices | II 

in the matrix, we use the corresponding upper case Latin italic letter to denote 
the matrix. The matrix in which all scalars are zero is denoted by (the third 
use of this symbol!). The a tj appearing in the array [a i} \ are called the 
elements of [a ti ]. Two matrices are equal if and only if they have exactly 
the same elements. The main diagonal of the matrix [a {j ] is the set of elements 
{a n , . . . , a n ) where / = min {m, n}. A diagonal matrix is a square matrix 
in which the elements not in the main diagonal are zero. 

Matrices can be used to represent a variety of different mathematical con- 
cepts. The way matrices are manipulated depends on the objects which they 
represent. Considering the wide variety of situations in which matrices have 
found application, there is a remarkable similarity in the operations performed 
on matrices in these situations. There are differences too, however, and to 
understand these differences we must understand the object represented and 
what information can be expected by manipulating with the matrices. We 
first investigate the properties of matrices as representations of linear trans- 
formations. Not only do the matrices provide us with a convenient means of 
doing whatever computation is necessary with linear transformations, but the 
theory of vector spaces and linear transformations also proves to be a power- 
ful tool in developing the properties of matrices. 

Let U be a vector space of dimension n and V a vector space of dimension 
m, both over the same field F. Let A = {a x , . . . , a n } be an arbitrary but 
fixed basis of U, and let B = {j8 l5 . . . , /S TO } be an arbitrary but fixed basis 
of V. Let a be a linear transformation of U into V. Since c(a,) e V, <r(a,) 
can be expressed uniquely as a linear combination of the elements of B ; 

m 

<*«,) = I <»<A (2-2) 

We define the matrix representing a with respect to the bases A and B to be 
the matrix A = [a^]. 

The correspondence between linear transformations and matrices is 
actually one-to-one and onto. Given the linear transformation cr, the a tj 
exist because B spans V, and they are unique because B is linearly independent. 
On the other hand, let A = [a i} ] be any m x n matrix. We can define 
o(ol } ) = 2*Li a aPi f° r each oijE A, and then we can extend the proposed 
linear transformation to all of U by the condition that it be linear. Thus, 
if I = 2j=i x ; a i' we define #« &***« ^uJ.;»4u'i« - • •*. "!.,*•.<*. /.'•'"■' 



n i m \ 

m I n \ 

= 2(2««*i)A. (2-3) 

i=i y=i / 



2 I Matrices 



39 



a can be extended to all of U because A spans U, and the result is well defined 
(unique) because A is linearly independent. 

Here are some examples of linear transformations and the matrices which 
represent them. Consider the real plane R 2 = U = V. Let A = 6 = {(1,0), 
(0, 1)}. A 90° rotation counterclockwise would send (1,0) onto (0, 1) and 
it would send (0, 1) onto (-1,0). Since <r((l, 0)) = • (1, 0) + 1 • (0, 1) and 
(T((0, 1)) = (-1) • (1, 0) + • (0, 1), cr is represented by the matrix 

-1 



1 







The elements appearing in a column are the coordinates of each image of a 
basis vector under a transformation. 

In general, a rotation counterclockwise through an angle of 6 will send 
(1, 0) onto (cos 6, sin 0) and (0, 1) onto (—sin 0, cos 0). Thus this rotation 
is represented by 



cos 

sin ( 



-sin 
cos 



(2.4) 



Suppose now that r is another linear transformation of U into V represented 
by the matrix B = [b ti ]. Then for the sum a + t we have 



(a + r)(a,) = cr(a,) + r(a,) = J a w ft + J b it fi t 

m 
= 2 ( a ii + b ii)Pi- 



(2.5) 



Thus a + t is represented by the matrix [a {j + Z>„]. Accordingly, we 
define the sum of two matrices to be that matrix obtained by the addition 
of the corresponding elements in the two arrays; A + B = [a tj + b u \ is 
the matrix corresponding to a + t. The sum of two matrices is denned if 
and only if the two matrices have the same number of rows and the same 
number of columns. 
If a is any scalar, for the linear transformation aa we have 



(0<r)(a,.) = a 2 a„ft = 2 (™a)P<- 



(2.6) 



Thus aa is represented by the matrix [aa^]. We therefore define scalar 
multiplication by the rule a A = [aa^]. 

Let W be a third vector space of dimension r over the field F, and let 
C = {yi» • • • , y r } be an arbitrary but fixed basis of W. If the linear trans- 
formation a of U into V is represented by them xn matrix A = [a^] and the 



40 



Linear Transformations and Matrices I II 



linear transformation t of V into W is represented by the r x m matrix 
B = [b ki ], what matrix represents the linear transformation to of U into W? 

(V(7)(a,) = t{o{<x,)) = rl |>u&) 

m 



[_0^v — . 






eci-n-, 



V-$jp*4UV*>~h^ <A &? v 



7H j T \ 

r /m \ 

fc=l \i=l // 



S.a.^,1 



• (2.7) 



u.»vAu I p 1 



Thus, if we define c kj = ^T =1 b^a^, then C = [c kj ] is the matrix representing 
the product transformation to. Accordingly, we call C the matrix product 
of B and >4, in that order: C = BA. ^J- u^C**-** ^.k*>.i Ike ^«jt < 

For computational purposes it is customary to write the arrays of B ^T" 1 
and A side by side. The element c kj of the product is then obtained by ,^ 7< 
multiplying the corresponding elements of row k of B and column j of A 
and adding. We can trace the elements of row k of B with a finger of the 
left hand while at the same time tracing the elements of column j of A with 
a finger of the right hand. At each step we compute the product of the 
corresponding elements and accumulate the sum as we go along. Using 
this simple rule we can, with practice, become quite proficient, even to the 
point of doing "without hands." 

Check the process in the following examples : 



1 4 


-1 


2 


1 


-2 1 


-2 





"l 


-f 






2" 





2 




" 5 


2" 


3 


2 


1 


= 


11 


-1 


2_ 











-2_ 




3 


-2 







All definitions and properties we have established for linear transformations 
can be carried over immediately for matrices. For example, we have: 

1. • A = 0. (The "0" on the left is a scalar, the "0" on the right is a 
matrix with the same number of rows and columns as A.) 

2. \- A = A. 

3. A(B + C)= AB + AC. 

4. {A + B)C = AC + BC. 

5. A(BC) = (AB)C. 

Of course, in each of the above statements we must assume the operations 
proposed are well defined. For example, in 3, B and C must be the same 



2 I Matrices 



41 



size and A must have the same number of columns as B and C have rows. 
The rank and nullity of a matrix A are the rank and nullity of the associated 
linear transformation, respectively. 

Theorem 2.1. For an m x n matrix A, the rank of A plus the nullity of A 
is equal to n. The rank of a product BA is less than or equal to the rank of 
either factor. 

These statements have been established for linear transformations and 
therefore hold for their corresponding matrices. □ 

The rank of a is the dimension of the subspace Im(<r) of V. Since Im(a) 
is spanned by {ofaj, . . . , <r(a B )}, p{a) is the number of elements in a 
maximal linearly independent subset of {</(%), . . . , <r(a w )}. Expressed 
in terms of coordinates, <r(a,) = J™ =1 a it p t is represented by the m-tuple 
(a li5 a 2j , . . . , a mj ), which is the ra-tuple in column j of the matrix [a t ]. 
Thus p{a) = p{A) is also equal to the maximum number of linearly inde- 
pendent columns of A. This is usually called the column rank of a matrix 
A, and the maximum number of linearly independent rows of A is called 
the row rank of A. We, however, show before long that the number of 
linearly independent rows in a matrix is equal to the number of linearly 
independent columns. Until that time we consider "rank" and "column 
rank" as synonymous. 

Returning to Equation (2.3), we see that, if £ e 1/ is represented by 
( x i, ■ • • , * n ) and the linear transformation a of U into V is represented 
by the matrix A = [a ti ], then <r(£) e V is represented by (y ls . . . , yj where 



yt=i a n x i (i=l, ...,m). 



3=1 



(2.8) 



In view of the definition of matrix multiplication given by Equation (2.7) 
we can interpret Equations (2.8) as a matrix product of the form 



where 



r = 



Vi 



Y= AX 



and 



(2.9) 



X = 



This single matric equation contains the m equations in (2.8). 

We have already used the n-tuple (x lt . . . , x n ) to represent the vector 
f = XU x i<*-i- Because of the usefulness of equation (2.9) we also find it 
convenient to represent f by the one-column matrix X. In fact, since it is 



42 



Linear Transformations and Matrices I II 



somewhat wasteful of space and otherwise awkward to display one-column 
matrices we use the «-tuple (x 1} . . . , x n ) to represent not only the vector £ 
but also the column matrix X. With this convention [x x • • • x n ] is a one-row 
matrix and (x lt . . . , x n ) is a one-column matrix. 

Notice that we have now used matrices for two different purposes, (1) to 
represent linear transformations, and (2) to represent vectors. The single 
matric equation Y = AX contains some matrices used in each way. 



EXERCISES 

1 . Verify the matrix multiplication in the following examples : 



(«) 



(*) 



3 
-5 



1 -2" 

2 3 



' 2 

-1 

1 



' 2 

-1 

1 



1 


-3" 




6 


1 


= 





-2 





3 
-9 



1 


-3" 


" 2" 




"10" 


6 


1 


3 


= 


15 





-2 


-1 




4 



(c) 



2. Compute 



3 
-5 



3 
-9 



-4' 
11 



10" 
15 
4 

2" 

3 

-1 



37 



-4 
11 



Interpret the answer to this problem in terms of the computations in Exercise 1. 
3. Find AB and BA if 



A = 



"10 1" 
110 
10 10 
10 1 



B = 



' 1 

5 

-1 



2 3 

6 7 

-2 -3 



-4 



-5 -6 -7 -! 



4. Let cr be a linear transformation of R 2 into itself that maps (1 , 0) onto (3, -1) 
and (0, 1) onto ( — 1,2). Determine the matrix representing a with respect to the 
bases A = B ={(1,0), (0,1)}. 



2 | Matrices 43 

5. Let a be a linear transformation of R 2 into itself that maps (1, 1) onto (2, —3) 
and (1, —1) onto (4, —7). Determine the matrix representing a with respect to the 
bases A = B = {(1 , 0), (0, 1)}. (Hint: We must determine the effect of a when it is 
applied to (1,0) and (0, 1). Use the fact that (1,0)= |(1, 1) + 1(1, -1) and the 
linearity of a.) 

6. It happens that the linear transformation denned in Exercise 4 is one-to-one, 
that is, a does not map two different vectors onto the same vector. Thus, there is 
a linear transformation that maps (3, -1) onto (1,0) and ( — 1,2) onto (0, 1). 
This linear transformation reverses the mapping given by a. Determine the matrix 
representing it with respect to the same bases. 

7. Let us consider the geometric meaning of linear transformations. A linear 
transformation of R 2 into itself leaves the origin fixed (why?) and maps straight 
lines into straight lines. (The word "into" is required here because the image of a 
straight line may be another straight line or it may be a single point.) Prove that 
the image of a straight line is a subset of a straight line. (Hint: Let a be represented 
by the matrix 

A = 

Then a maps (x, y) onto (a u a> + a 12 y, a 21 x + a 22 y). Now show that if (x, y) 
satisfies the equation ax + by = c its image satisfies the equation 

(aa 22 - ba 21 )x + (a n b - a 12 a)y = (a u a 22 — a 12 a 21 )c.) 

8. (Continuation) We say that a straight line is mapped onto itself if every 
point on the line is mapped onto a point on the line (but not all onto the same 
point) even though the points on the line may be moved around. 

(a) A linear transformation maps (1, 0) onto (-1,0) and (0, 1) onto (0, -1). 
Show that every line through the origin is mapped onto itself. Show that each 
such line is mapped onto itself with the sense of direction inverted. This linear 
transformation is called an inversion with respect to the origin. Find the matrix 
representing this linear transformation with respect to the basis {(1,0), (0, 1)}. 

(b) A linear transformation maps (1,1) onto (-1, -1) and leaves (1, -1) 
fixed. Show that every line perpendicular to the line x x + x 2 = is mapped onto 
itself with the sense of direction inverted. Show that every point on the line 
x i + x 2 = is left fixed. Which lines through the origin are mapped onto them- 
selves? This linear transformation is called a reflection about the line x x + x 2 = 0. 
Find the matrix representing this linear transformation with respect to the basis 
{(1,0), (0, 1)}. Find the matrix representing this linear transformation with 
respect to the basis {(1, 1), (1, —1)}. 

(c) A liner transformation maps (1, 1) onto (2, 2) and (1, -1) onto (3, -3). 
Show that the lines through the origin and passing through the points (1,1) and 
(1, -1) are mapped onto themselves and that no other lines are mapped onto 
themselves. Find the matrices representing this linear transformation with respect 
to the bases {(1, 0), (0, 1)} and {(1, 1), (1, -1)}. 



44 



Linear Transformations and Matrices I II 



(d) A linear transformation leaves (1 , 0) fixed and maps (0, 1) onto (1 , 1). Show 
that each line x 2 = c is mapped onto itself and translated within itself a distance 
equal to c. This linear transformation is called a shear. Which lines through the 
origin are mapped onto themselves? Find the matrix representing this linear 
transformation with respect to the basis {(1,0), (0, 1)}. 

0) A linear transformation maps (1 , 0) onto ( T 5 3 , j-f) and (0, 1) onto ( -yf , ts). 
Show that every line through the origin is rotated counterclockwise through the 
angle 6 = arc cos T V This linear transformation is called a rotation. Find the 
matrix representing this linear transformation with respect to the basis {(1,0), 

(0,1)}- , n 

(/) A linear transformation maps (1 , 0) onto (f , f ) and (0, 1) onto (3 , 3). Show 
that each point on the line 2x x + x 2 = 3c is mapped onto the single point (c, c). 
The line x 1 - x 2 = is left fixed. The only other line through the origin which 
is mapped into itself is the line 2x l + x 2 = 0. This linear transformation is called 
a projection onto the line x x - x 2 = parallel to the line 2x 1 + x 2 = 0. Find the 
matrices representing this linear transformation with respect to the bases {(1 , 0), 
(0, 1)} and {(1,1), (1, -2)}. 

9. (Continuation) Describe the geometric effect of each of the linear transforma- 
tions of R 2 into itself represented by the matrices 



(«) 



"o r 


(b) 


"0 0" 


(c) 


"l r 




1 




1 









"1 0" 




~b 0" 




r3 4-1 
5 5 


a 1 


(e) 


c 


if) 


4 3 

5 5 



id) 



{Hint: In Exercise 7 we have shown that straight lines are mapped into straight 
lines. We already know that linear transformations map the origin onto the origin. 
Thus it is relatively easy to determine what happens to straight lines passing through 
the origin. For example, to see what happens to the a^-axis it is sufficient to see 
what happens to the point (1,0). Among the transformations given appear a 
rotation, a reflection, two projections, and one shear.) 

10. (Continuation) For the linear transformations given in Exercise 9 find all 
lines through the origin which are mapped onto or into themselves. 

11. Let U = R 2 and V = R 3 and a be a linear transformation of U into V that 
maps (1, 1) onto (0, 1, 2) and ( — 1, 1) onto (2, 1, 0). Determine the matrix that 
represents a with respect to the bases A = {(1, 0), (0, 1)} in 8 = {(1, 0, 0), (0, 1,0), 
(0,0, 1)} in R 3 . (Hint: |(1> D ~ K"l. D = 0.0)-) 

12. What is the effect of multiplying an n x n matrix A by an n x n diagonal 
matrix Z>? What is the difference between AD and DAI 

13. Let a and b be two numbers such that a ^ b. Find all 2 x 2 matrices A 
such that 



a 



a 



0" 
b 



A. 



3 | Non-singular Matrices 45 

14. Show that the matrix C = [a^bj] has rank one if not all a t and not all b t are 
zero. {Hint: Use Theorem 1.12.) 

15. Let a, b, c, and d be given numbers (real or complex) and consider the 
function 

ax + b 

J ex + d 

Let g be another function of the same form. Show that gf where gf{x) = g{f{x)) 
is a function that can also be written in the same form. Show that each of these 
functions can be represented by a matrix in such a way that the matrix representing 
gfis the product of the matrices representing £■ and/. Show that the inverse function 
exists if and only if ad — be ^ 0. To what does the function reduce if ad — be =0? 

16. Consider complex numbers of the form x + yi (where x and y are real 
numbers and i 2 = — 1) and represent such a complex number by the duple {x, y) 
in R 2 . Let a + bi be a fixed complex number. Consider the function / defined by 
the rule 

f{x + yi) = {a + bi){x + yi) = u + vi. 

{a) Show that this function is a linear transformation of R 2 into itself mapping 
{x, y) onto {u, v). 

{b) Find the matrix representing this linear transformation with respect to the 
basis {(1,0), (0,1)}. 

(c) Find the matrix which represents the linear transformation obtained by using 
c + di in place of a + bi. Compute the product of these two matrices. Do they 
commute? 

{d) Determine the complex number which can be used in place of a + bi to 
obtain a transformation represented by this matrix product. How is this complex 
number related to a + bi and c + dfl. 

17. Show by example that it is possible for two matrices A and B to have the 
same rank while A 2 and B 2 have different ranks. 



3 I Non-singular Matrices 

Let us consider the case where U = V, that is, we are considering trans- 
formations of V into itself. Generally, a homomorphism of a set into itself 
is called an endomorphism. We consider a fixed basis in V and represent 
the linear transformation of V into itself with respect to that basis. In this 
case the matrices are square or n x n matrices. Since the transformations 
we are considering map Vinto itself any finite number of them can be iterated 
in any order. The commutative law does not hold, however. The same 
remarks hold for square matrices. They can be multiplied in any order but 



46 



Linear Transformations and Matrices I II 



the commutative law does not hold. For example 



"o r 


"0 o~ 




"o r 


_o o_ 


1_ 




o o_ 


"0 0" 


"o r 




"0 0" 


_o L 


_o o_ 




.o o_ 



The linear transformation that leaves every element of V fixed is the 
identity transformation. We denote the identity transformation by 1, 
the scalar identity. Clearly, the identity transformation is represented by 
« the matrix / = [d u ] for any choice of the basis. Notice that I A = Al = A 
for any n X n matrix A. I is called the identity matrix, or unit matrix, of 
order n. If we wish to point out the dimension of the space we write /„ for 
the identity matrix of order n. The scalar transformation a is represented 
by the matrix al. Matrices of the form al are called scalar matrices. 

Definition. A one-to-one linear transformation a of a vector space onto 
itself is called an automorphism. An automorphism is only a special kind of 
isomorphism for which the domain and codomain are the same space. If 
<x(a) = a, the mapping a -1 (a) = a is called the inverse transformation of a. 
The rotations represented in Section 2 are examples of automorphisms. 

* Theorem 3.1. The inverse a~ x of an automorphism a is an automorphism. 

« Theorem 3.2 A linear transformation r of an n-dimensional vector space 
into itself is an automorphism if and only if it is of rank n ; that is, if and only if 
it is an epimorphism. 

» Theorem 3.3. A linear transformation a of an n-dimensional vector space 
into itself is an automorphism if and only if its nullity is 0, that is, if and only 
if it is a monomorphism. 

proof (of Theorems 3.1, 3.2, and 3.3). These properties have already 
been established for isomorphisms. □ 

Since it is clear that transformations of rank less than n do not have 
'inverses because they are not onto, we see that automorphisms are the 
'only linear transformations which have inverses. A linear transformation 
that has an inverse is said to be non-singular or invertible; otherwise it is 
said to be singular. Let A be the matrix representing the automorphism 
a, and let A" 1 be the matrix representing the inverse transformation a~ x . 
The matrix A~ X A represents the transformation a~ x a. Since cr x a is the 
identity transformation, we must have A~ X A = I. But a is also the inverse 
transformation of a~ x so that acr x = 1 and AA~ X = I. We shall refer to 
A~ x as the inverse of A. A matrix that has an inverse is said to be non- 
singular or invertible. Only a square matrix can have an inverse. 



3 | Non-singular Matrices 



47 



On the other hand suppose that for the matrix A there exists a matrix 
B such that BA = I. Since / is of rank n, A must also be of rank n and, * 
therefore, A represents an automorphism a. Furthermore, the linear 
transformation which B represents is necessarily the inverse transformation 
a" 1 since the product with a must yield the identity transformation. Thus 
B = A* 1 . The same kind of argument shows that if C is a matrix such that 
AC = I, then C = A~ x . Thus we have shown: 

Theorem 3.4. If A and B are square matrices such that BA = /, then 
AB = I. If A and B are square matrices such that AB = I, then BA = I. 
In either case B is the unique inverse ofA.O 

Theorem 3.5. If A and B are non-singular, then (1) AB is non-singular and 
(AB)- 1 = B^A' 1 , (2) A' 1 is non-singular and (A' 1 )- 1 = A, (3) for a ^ 0, 
aA is non-singular and (aA) -1 = a~ 1 A~ 1 . 

proof. In view of the remarks preceding Theorem 3.4 it is sufficient in 
each case to produce a matrix which will act as a left inverse. 

(1) (B^A-^iAB) = B- 1 (A~ 1 A)B = B~ X IB = B~ X B = I. 

(2) AA' 1 = I. 

(3) (a^A-^iaA) = (a^aXA^A) = /. □ 

Theorem 3.6. If A is non-singular, we can solve uniquely the equations 
XA = B and AY = Bfor any matrix B of the proper size, but the two solutions 
need not be equal. 

proof. Solutions exist since {BA~ X )A = B{A~ X A) = B and A(A~ X B) = 
(AA _1 )B = B. The solutions are unique since for any C having the property 
that CA — B we have C = CAA~ X = BA~ X , and similarly with any solution 
of ,4 7= B. a 

As an example illustrating the last statement of the theorem, let 



A = 



Then 



X = BA- 1 = 



1 


2" 




1_ 


» 




"1 


-2 




2 


-3 



A~ x = 



and 



B = 



Y = A~ X B = 



I 0" 




I 1_ 




-3 


-2 


2 


1 



We add the remark that for non-singular A, the solution of XA = B 
exists and is unique if B has n columns, and the solution of AY = B exists 
and is unique if B has n rows. The proof given for Theorem 3.6 applies 
without change. 



48 



Linear Transformations and Matrices I II 



Theorem 3.7. The rank of a (not necessarily square) matrix is not changed 
by multiplication by a non-singular matrix. 

proof. Let A be non-singular and let B be of rank p. Then by Theorem 
2.1 AB is of rank r < p, and A^iAB) = B is of rank p < r. Thus r = p. 
The proof that BA is of rank p is similar. □ 

Theorem 1.14 states the corresponding property for linear transformations. 

The existence or non-existence of the inverse of a square matrix depends 
on the matrix itself and not on whether it represents a linear transformation 
of a vector space into itself or a linear transformation of one vector space 
into another. Thus it is convenient and consistent to extend our usage of the 
term "non-singular" to include isomorphisms. Accordingly any square 
matrix with an inverse is non-singular. 

Let U and V be vector spaces of dimension n over the field F. Let A = 
{aj, . . . , a n } be a basis of U and 8 = {(S x , . . . , B n } be a basis of V. If 
£ = XLi^t i s an y vector in U we can define <r(£) to be ^Li^f ^ * s 
easily seen that <r is an isomorphism and that £ and cr(£) are both repre- 
sented by (#!, . . . , x n ) e F n . Thus any two vector spaces of the same 
dimension over F are isomorphic. As far as their internal structure is con- 
cerned they are indistinguishable. Whatever properties may serve to dis- 
tinguish them are, by definition, not vector space properties. 



EXERCISES 

1 . Show that the inverse of 

"1 2 3" 

A = 2 3 4 
3 4 6 

2. Find the square of the matrix 



is 



A' 1 = 



-2 

1 





3 

-2 



1" 

-2 
1 



2 

-2 

1 



2" 
1 

-2 



What is the inverse of Al (Geometrically, this matrix represents a 180° rotation 
about the line containing the vector (2, 1, 1). The inverse obtained is therefore 
not surprising.) 

3. Compute the image of the vector (1, —2, 1) under the linear transformation 
represented by the matrix 

"1 2 3" 



A = 



Show that A cannot have an inverse. 



2 3 4 
1 2 



3 | Non-singular Matrices 
4. Since 

T ll X 12 



49 



we can find the inverse of 



3 -1 

-5 2 
" 3 -1 
_-5 2 
ix 1:l 5cc 12 



11 ^12 ~"~ 11 ' ^^12 

JX 21 J^22 "^21 "r" •^ a '22 



by solving the equations 

= 1 
= 

•'• C 21 5#22 = 

X %\ "■ ■^" c 22 = 1- 

Solve these equations and check your answer by showing that this gives the inverse 
matrix. 

We have not as yet developed convenient and effective methods for obtaining 
the inverse of a given matrix. Such methods are developed later in this chapter 
and in the following chapter. If we know the geometric meaning of the matrix, 
however, it is often possible to obtain the inverse with very little work. 



5. The matrix 



represents a rotation about the origin through the angl& 



6 = arc cos f. What rotation would be the inverse of this rotation ? What matrix 
would represent this inverse rotation? Show that this matrix is the inverse of the 
given matrix. 

• r ° _i i 

6. The matrix represents a reflection about the line x, + x 2 = 0. 

-1 OJ 

What operation is the inverse of this reflection? What matrix represents the 
inverse operation? Show that this matrix is the inverse of the given matrix. 

n n 

7. The matrix represents a shear. The inverse transformation is also a 

shear. Which one? What matrix represents the inverse shear? Show that this 
matrix is the inverse of the given matrix. 

8. Show that the transformation that maps (x x , x 2 , x 3 ) onto (x 3 , — x x , x 2 ) is an 
automorphism of F 3 . Find the matrix representing this automorphism and its 
inverse with respect to the basis {(1,0, 0), (0, 1, 0), (0, 0, 1)}. 

9. Show that an automorphism of a vector space maps every subspace onto a 
subspace of the same dimension. 

10. Find an example to show that there exist non-square matrices A and B 
such that AB — I. Specifically, show that there is an m x n matrix A and an 
n x m matrix B such that AB is the m x m identity. Show that BA is not the 
n x n identity. Prove in general that if m ^ n, then AB and BA cannot both be 
identity matrices. 



50 Linear Transformations and Matrices | II 

4 I Change of Basis 

We have represented vectors and linear transformations as /z-tuples and 
matrices with respect to arbitrary but fixed bases. A very natural question 
arises: What changes occur in these representations if other choices for 
bases are made? The vectors and linear transformations have meaning 
independent of any particular choice of bases, independent of any coordinate 
systems, but their representations are entirely dependent on the bases chosen. 

Definition. Let A = {a l9 . . . , a„} and A' = {x' v . . . , ol'J be bases of the 
vector space U. In a typical "change of basis" situation the representations 
of various vectors and linear transformations are known in terms of the 
basis A, and we wish to determine their representations in terms of the 
basis A'. In this connection, we refer to A as the "old" basis and to A' as 
the "new" basis. Each ex.'. is expressible as a linear combination of the 
elements of A; that is, 

*; = i>««i. P (4.i) 

The associated matrix P = [p i0 \ is called the matrix of transition from the 
basis A to the basis A\ 

The columns of P are the ^-tuples representing the new basis vectors in 
terms of the old basis. This simple observation is worth remembering as 
it is usually the key to determining P when a change of basis is made. Since 
the columns of P are the representations of the basis A' they are linearly 
independent and P has rank n. Thus P is non-singular. 

Now let £ = 2"=i x iV*i be an arbitrary vector of U and let £ = 2> =1 x\ix i ~ s 
be the representation of £ in terms of the basis A'. Then <> 

n n / n \ 

= 2 llPiri W ( 4 - 2 > 

i=i y=i / 
Since the representation of £ with respect to the basis A is unique we see 
that x t = 2j l =i/ > tf a: i- Notice that the rows of P are used to express the 
old coordinates of I in terms of the new coordinates. For emphasis and 
contradistinction, we repeat that the columns of P are used to express the 
new basis vectors in terms of the old basis vectors. 

Let X = (x x , . . . , x n ) and X' = (x[, . . . , x^) be n x 1 matrices representing 
the vector I with respect to the bases A and A'. Then the set of relations 
{ x i = ^j=\Pio x 'j) can ^ e written as the single matric equation 

X = PX'. (4.3) 



4 | Change of Basis 51 

Now suppose that we have a linear transformation a of U into V and 
that A = [a i} ] is the matrix representing a with respect to the bases A in 
U and B = {ft, . . . , ftj in V. We shall now determine the representation 
of a with respect to the bases A' and B. 

n < ' f n I m v 

k=l k=l \i=l / 

to J n \ 

"\" h ^c *, ^/,.,U * N ■■■■■-' ■ 
«i x ( . • 

= 1<A- V^o -f\h ■ (4.4) 

1=1 

Since 6 is a basis, a' i:j = 2jL x a uPki and tne matrix ^f' = [a' i3 ] representing 
a with respect to the bases A' and 6 is related to A by the matric equation 

A' = AP. (4.5) 

This relation can also be demonstrated in a slightly different way. For 
an arbitrary f = 2» =1 a^a, e U let <r(£) = £™ r */,&• Then we have 

Y.= AX = ,4 (/>*') = (AP)A". (4.6) 

Thus v4P is a matrix representing c with respect to the bases A' and B. Since 
the matrix representing a is uniquely determined by the choice of bases we 
have A' = AP. 

Now consider the effect of a change of basis in the image space V. Thus 
let B be replaced by the basis B' = {ft, . . . , ftj. Let = fo w ] be the 
matrix of transition from B to B', that is, ft = 2™ ^ft. Then if >4" = 
[a".] represents a with respect to the bases A and B' we have 

TO TO / TO \ 

<r(a,-) = 2X;& = 2 a'ki ( 2 0*A) 

TO / TO v TO 

= 2(I«tt*wU = 2««A. (4.7) 

1=1 \fc=l / i=l 

Since the representation of cr(a,) in terms of the basis B is unique we see 
that A = QA", or 

4" = g- 1 ^. (4.8) 

Combining these results, we see that, if both changes of bases are made at 
once, the new matrix representing a is Q~ X AP. 

As in the proof of Theorem 1.6 we can choose a new basis A' = {04, . . . , <x.' n } 
of U such that the last v = n — p basis elements form a basis of K{a). Since 
M04), . . . , <r(<Xp)} is a basis of a(U) and is linearly independent in V, it can 



52 



Linear Transformations and Matrices I II 



be extended to a basis B' of V. With respect to the bases A' and 8' we have 
or(a') = ft for/" < p while cr(a^) = for/ > p. Thus the new matrix Q~ X AP 
representing a is of the form 



p columns 
10 
1 



p rows 



m — p rows 







v columns 



Thus we have 

Theorem 4.1. If A is any m X n matrix of rank p, there exist a non- 
singular n x n matrix P and a non-singular m x m matrix Q such that 
A' = Q~ X AP has the first p elements of the main diagonal equal to 1, and 
all other elements equal to zero. □ 

When A and B are unrestricted we can always obtain this relatively simple 
representation of a linear transformation by a proper choice of bases. 
More interesting situations occur when A and B are restricted. Suppose, 
for example, that we take U = V and A = B. In this case there is but one 
basis to change and but one matrix of transition, that is, P = Q. In this 
case it is not possible to obtain a form of the matrix representing a as simple 
as that obtained in Theorem 4.1. We say that any two matrices representing 
the same linear transformation or of a vector space V into itself are similar. 
This is equivalent to saying that two matrices A and A' are similar if and 
only if there exists a non-singular matrix of transition P such that A' = 
P- X AP. This case occupies much of our attention in Chapters III and V. 



EXERCISES 

1. In P 3 , the space of polynomials of degree 2 or smaller with coefficients in 
F, let A ={\,x,x 2 }. 

A' = { Pl {x) = x 2 + x + 1 , p 2 (x) =x 2 -x -2, p 3 (x) = x 2 + x - 1} 
is also a basis. Find the matrix of transition from A to A'. 



5 [ Hermite Normal Form 53 

2. In many of the uses of the concepts of this section it is customary to take 
A = {a. i \x i = (d n , 6 i2 , ... , d in )} as the old basis in R n . Thus, in R 2 let A = 
{(1, 0), (0, 1)} and A' = {(£, V3/2), (- V3/2, £)}. Show that 



P = 



\ -V3/2 
V3/2 i 



is the matrix of transition from A to A'. 

3. (Continuation) With A' and A as in Exercise 2, find the matrix of transition R 
from A' to A. (Notice, in particular, that in Exercise 2 the columns of P are the 
components of the vectors in A' expressed in terms of basis A, whereas in this exercise 
the columns of R are the components of the vectors in A expressed in terms of the 
basis A'. Thus these two matrices of transition are determined relative to different 
bases.) Show that RP = I. 

4. (Continuation) Consider the linear transformation of o of R 2 into itself which 
maps 

(1,0) onto (£, V3/2) 
(0,1) onto (-V3/2,|). 

Find the matrix A that represents a with respect to the basis A. 

You should obtain A = P. However, A and P do not represent the same thing. 
To see this, let £ = (x lt x 2 ) be an arbitrary vector in R 2 and compute a(£) by means 
of formula (2.9) and the new coordinates of £ by means of formula (4.3). 

A little reflection will show that the results obtained are entirely reasonable. 
The matrix A represents a rotation of the real plane counterclockwise through an 
angle of w/3. The matrix P represents a rotation of the coordinate axes counter- 
clockwise through an angle of w/3. In the latter case the motion of the plane 
relative to the coordinate axes is clockwise through an angle of 77/3. 

5. In R 3 let A = {(1,0, 0), (0, 1,0), (0, 0, 1)} and let A' = {(0, 1, 1), (1,0, 1), 
(1,1, 0)}. Find the matrix of transition P from A to A' and the matrix of transition 
P- 1 from A' to A. 

6. Let A, 8, and C be three bases of V. Let P be the matrix of transition from A 
to 6 and let Q be the matrix of transition from 8 to C. Is PQ or QP the matrix of 
transition from A to C? Compare the order of multiplication of matrices of transi- 
tion and matrices representing linear transformation. 

7. Use the results of Exercise 6 to resolve the question raised in the parenthetical 
remark of Exercise 3, and implicitly assumed in Exercise 5. If P is the matrix of 
transition from A to A' and Q is the matrix of transition from A' to A, show that 
PQ =/. 

5 I Hermite Normal Form 

We may also ask how much simplification of the matrix representing a 
linear transformation a of U into V can be effected by a change of basis in 



54 



Linear Transformations and Matrices I II 



V alone. Let A = {a l5 . . . , a n } be the given basis in U and let U k = (a l5 . . . , 
a fc ). The subspaces 0(U k ) of V form a non-decreasing chain of subspaces with 
«(U k -i) <= <y(U k ) and o"(U n ) = cr(U). Since a(U k ) = a^) + (a(oi k )) we see 
from Theorem 4.8 of Chapter I that dim a(U k ) < dim o^L/^) + 1 ; that is, 
the dimensions of the o"(U fc ) do not increase by more than 1 at a time as k 
increases. Since dim (r(U n ) = p, the rank of a, an increase of exactly 1 
must occur p times. For the other times, if any, we must have dim c(U k ) = 
dim ^(L/fc.i) and hence cr(U k ) = cf(U k _^). We have an increase by 1 when 
cf(ce. k ) <£ ^(C^i) and no increase when a{a. k ) e oiU^). 

Let k lf k 2 , . . . , k p be those indices for which o-(a fc .) <£ a(U k 1 ). Let 
ft = ^K)- Since ft £ er^) = (ft, . . . , ft_ x >, the set {ft, . .'. , ft} is 
linearly independent (see Theorm 2.3, Chapter 1-2). Since {ft, . . . , ft} <= 
<r(t/) and cr(l/) is of dimension p, {ft, . . . , ft} is a basis of a(U). This set 
can be extended to a basis B' of V. Let us now determine the form of the 
matrix A' representing a with respect to the bases A and 8'. 

Since o(ct. k ^ = ft, column k t has a 1 in row i and all other elements of 
this column are O's. For k t <j < k i+1 , <r(oc 3 ) e <r(C fc .) so that column j 
has O's below row /. In general, there is no restriction on the elements of 
column j in the first i rows. A' thus has the form 



column 



column 







*1 




/c 2 




• 


• 


1 


"l,fcl+l • 


.. 


r 

"l,fc 2 +l 


• 


• 








• • 1 


a 2,fc 2 +l 


• 


• 








• 






















(5.1) 



Once A and a are given, the k i and the set {ft, . . . , ft} are uniquely 
determined. There may be many ways to extend this set to the basis 6', 
but the additional basis vectors do not affect the determination of A' since 
every element of a{U) can be expressed in terms of {ft, ... , ft} alone. Thus 
A' is uniquely determined by A and a. 

Theorem 5.1. Given any m x n matrix A of rank p, there exists a non- 
singular m x m matrix Q such that A' = Q~ X A has the following form: 

(1) There is at least one non-zero element in each of the first p rows of A' , 
and the elements in all remaining rows are zero. 



5 | Hermite Normal Form 55 

(2) The first non-zero element appearing in row i (i < p) is a 1 appearing 
in column k t , where k x < k 2 < • • • < k p . 

(3) In column k t the only non-zero element is the 1 in row i. 

The form A' is uniquely determined by A. 

proof. In the applications of this theorem that we wish to make A is 
usually given alone without reference to any bases A and 6, and often without 
reference to any linear transformation a. We can, however, introduce any 
two vector spaces U and V of dimensions n and m over F and let A be any 
basis of U and 8 be any basis of V. We can consider A as defining a linear 
transformation a of U into V with respect to the bases A and 8. The discussion 
preceding Theorem 5.1 shows that there is at least one non-singular matrix 
Q such that Q _1 A satisfies conditions (1), (2), and (3). 

Now suppose there are two non-singular matrices Q x and Q 2 such that 
Ql x A = A' x and Q 2 X A = A 2 both satisfy the conditions of the theorem. 
We wish to conclude that A[ = A' 2 . No matter how the vector spaces U 
and V are introduced and how the bases A and 8 are chosen we can regard 
Q x and Q 2 as matrices of transition in V. Thus A[ represents a with respect 
to bases A and 8^ and A' 2 represents a with respect to bases A and & 2 . But 
condition (3) says that for i < p the rth basis element in both 8^ and B' 2 is 
<7(<x fc .). Thus the first p elements of B[ and B' 2 are identical. Condition (1) says 
that the remaining basis elements have nothing to do with determining the 
coefficients in A[ and A' 2 . Thus A' x = A 2 . □ 

We say that a matrix satisfying the conditions of Theorem 5.1 is in 
Hermite normal form. Often this form is called a row-echelon form. And 
sometimes the term, Hermite normal form, is reserved for a square matrix 
containing exactly the numbers that appear in the form we obtained in 
Theorem 5.1 with the change that row /' beginning with a 1 in column k t 
is moved down to row k t . Thus each non-zero row begins on the main 
diagonal and each column with a 1 on the main diagonal is otherwise zero. 
In this text we have no particular need for this special form while the form 
described in Theorem 5.1 is one of the most useful tools at our disposal. 

The usefulness of the Hermite normal form depends on its form, and 
the uniqueness of that form will enable us to develop effective and con- 
venient short cuts for determining that form. 

Definition. Given the matrix A, the matrix A T obtained from A by inter- 
changing rows and columns in A is called the transpose of A. If A T = [a'^], 
the element a' tj appearing in row i column j of A T is the element a H appear- 
ing in row j column / of A. It is easy to show that (AB) T = B T A T . (See 
Exercise 4.) 

Proposition 5.2. The number of linearly independent rows in a matrix is 
equal to the number of linearly independent columns. 



56 



Linear Transformations and Matrices I II 



proof. The number of linearly independent columns in a matrix A is its 
rank p. The Hermite normal form A' = Q~ X A corresponding to A is also 
of rank p. For A' it is obvious that the number of linearly independent rows 
in A' is also equal to p, that is, the rank of (A') T is p. Since Q T is non- 
singular, the rank of A T = (QA') T = (A') T Q T is also p. Thus the number 
of linearly independent rows in A is p. □ 



(a) 



(b) 



EXERCISES 

1. Which of the following matrices are in Hermite normal form? 
"01001" 

10 1 
10 
0_ 
"00204" 
110 3 
12 
0_ 

"i o o o r 

10 1 

11 

0_ 
"0101001" 

10 10 

10 

0_ 
"10 10 1" 

110 

oooio 

lj 

2. Determine the rank of each of the matrices given in Exercise 1. 

3. Let a and r be linear transformations mapping R 3 into R 2 . Suppose that for 
a given pair of bases A for R 3 and B for R 2 , a and t are represented by 



(c) 



(d) 



(e) 



A = 



T 1 0' 
1 



and 



B = 



1 1" 
1 



6 | Elementary Operations and Elementary Matrices 



57 



respectively. Show that there is no basis 8' of R 2 such that B is the matrix represent- 
ing a with respect to A and 8'. 

4. Show that 

(a) (A + B) T = A T + B T , 

(b) (AB) T = B T A T , 

(c) {A-^)T = (AT)-\ 



6 I Elementary Operations and Elementary Matrices 

Our purpose in this section is to develop convenient computational 
methods. We have been concerned with the representations of linear 
transformations by matrices and the changes these matrices undergo when 
a basis is changed. We now show that these changes can be effected by 
elementary operations on the rows and columns of the matrices. 

We define three types of elementary operations on the rows of a matrix A. 

Type I : Multiply a row of A by a non-zero scalar. 
Type II : Add a multiple of one row to another row. 
Type III : Interchange two rows. 

Elementary column operations are defined in an analogous way. 

From a logical point of view these operations are redundant. An opera- 
tion of type III can be accomplished by a combination of operations of 
types I and II. It would, however, require four such operations to take the 
place of one operation of type III. Since we wish to develop convenient 
computational methods, it would not suit our purpose to reduce the number 
of operations at our disposal. On the other hand, it would not be of much 
help to extend the list of operations at this point. The student will find that, 
with practice, he can combine several elementary operations into one step. 
For example, such a combined operation would be the replacing of a row 
by a linear combination of rows, provided that the row replaced appeared 
in the linear combination with a non-zero coefficient. We leave such short 
cuts to the student. 

An elementary operation can also be accomplished by multiplying A on 
the left by a matrix. Thus, for example, multiplying the second row by the 
scalar c can be effected by the matrix 

1 ••• (T 



E 2 (c) = 



c ••• 
1 ••• 







(6.1) 



58 



Linear Transformations and Matrices I II 



The addition of k times the third row to the first row can be effected by the 
matrix 



E sl (k) = 



"1 k • 


• 0" 


10- 


• 


1- 


• 



(6.2) 



_0 • • • 1 

The interchange of the first and second rows can be effected by the matrix 

~0 1 • • • 0" 
1 ••• 
1 ••• 



•£l9 



(6.3) 



_o o o ••• : 

These matrices corresponding to the elementary operations are called 
elementary matrices. These matrices are all non-singular and their inverses 
are also elementary matrices. For example, the inverses of E 2 (c), E 31 (k), and 
E 12 are respectively E^c' 1 ), E 31 (—k), and E 12 - 

Notice that the elementary matrix representing an elementary operation is 
the matrix obtained by applying the elementary operation to the unit matrix. 

Theorem 6.1. Any non-singular matrix A can be written as a product of 
elementary matrices. 

proof. At least one element in the first column is non-zero or else A 
would be singular. Our first goal is to apply elementary operations, if 
necessary, to obtain a 1 in the upper left-hand corner. If a n = 0, we can 
interchange rows to bring a non-zero element into that position. Thus we 
may as well suppose that a lx ^ 0. We can then multiply the first row by 
an" 1 . Thus, to simplify notation, we may as well assume that a lx = 1. 
We now add —a a times the first row to the rth row to make every other 
element in the first column equal to zero. 

The resulting matrix is still non-singular since the elementary operations 
applied were non-singular. We now wish to obtain a 1 in the position of 
element a 22 . At least one element in the second column other than a 12 



6 | Elementary Operations and Elementary Matrices 59 

is non-zero for otherwise the first two columns would be dependent. Thus 
by a possible interchange of rows, not including row 1, and multiplying the 
second row by a non-zero scalar we can obtain a 22 = 1- We now add — a iZ 
times the second row to the /th row to make every other element in the second 
column equal to zero. Notice that we also obtain a in the position of a 12 
without affecting the 1 in the upper left-hand corner. 

We continue in this way until we obtain the identity matrix. Thus if 
E lt E 2 , . . . ,E r are elementary matrices representing the successive elementary 
operations, we have 

I = E r -E 2 E l A, ?'■ M V ' : - 

or (6.4) 

A = E?E^-E~\n T; ^ p - r . 

In Theorem 5.1 we obtained the Hermite normal form A' from the matrix 
A by multiplying on the left by the non-singular matrix Q _1 . We see now 
that Q~ x is a product of elementary matrices, and therefore that A can be 
transformed into Hermite normal form by a succession of elementary row 
operations. It is most efficient to use the elementary row operations directly 
without obtaining the matrix Q~ x . 

We could have shown directly that a matrix could be transformed into 
Hermite normal form by means of elementary row operations. We would 
then be faced with the necessity of showing that the Hermite normal form 
obtained is unique and not dependent on the particular sequence of oper- 
ations used. While this is not particularly difficult, the demonstration is 
uninteresting and unilluminating and so tedious that it is usually left as an 
"exercise for the reader." Uniqueness, however, is a part of Theorem 5.1, 
and we are assured that the Hermite normal form will be independent of 
the particular sequence of operations chosen. This is important as many 
possible operations are available at each step of the work, and we are free 
to choose those that are most convenient. 

Basically, the instructions for reducing a matrix to Hermite normal form 
are contained in the proof of Theorem 6.1. In that theorem, however, we 
were dealing with a non-singular matrix and thus assured that we could 
at certain steps obtain a non-zero element on the main diagonal. For a 
singular matrix, this is not the case. When a non-zero element cannot be 
obtained with the instructions given we must move our consideration to the 
next column. 

In the following example we perform several operations at each step to 
conserve space. When several operations are performed at once, some 
care must be exercised to avoid reducing the rank. This may occur, for 
example, if we subtract a row from itself in some hidden fashion. In this 
example we avoid this pitfall, which can occur when several operations of 



f\" fi 



60 



Linear Transformations and Matrices I II 



type III are combined, by considering one row as an operator row and adding 
multiples of it to several others. 
Consider the matrix 



4 


3 


2 


-1 


4 


• 5 


4 


3 


-1 


4 


-2 


-2 


-1 


2 


-3 


11 


6 


4 


1 


11 



as an example. 

According to the instructions for performing the elementary row oper- 
ations we should multiply the first row by J. To illustrate another possible 
way to obtain the "1" in the upper left corner, multiply row 1 by —1 and 
add row 2 to row 1 . Multiples of row 1 can now be added to the other rows 
to obtain 



1 


1 


1 











-1 


-2 


-1 


4 








1 


2 


-3 





-5 


-7 


1 


11 



Now, multiply row 2 by —1 and add appropriate multiples to the other 
rows to obtain 



Finally, we obtain 



1 





-1 


-1 


4 





1 


2 


1 


-4 








1 


2 


-3 








3 


6 


-9 


"l 








1 


1 





1 





-3 


2 








1 


2 


-3 


















which is the Hermite normal form described in Theorem 5.1. If desired, 
Q~ x can be obtained by applying the same sequence of elementary row 
operations to the unit matrix. However, while the Hermite normal form 
is necessarily unique, the matrix Q~ x need not be unique, as the proof of 
Theorem 5.1 should show. 



6 | Elementary Operations and Elementary Matrices 



61 



Rather than trying to remember the sequence of elementary operations 
used to reduce A to Hermite normal form, it is more efficient to perform 
these operations on the unit matrix at the same time we are operating on 
A. It is suggested that we arrange the work in the following way: 



3 2 

4 3 
-2 -1 

6 4 

1 1 

-1 -2 -1 

1 2 

-5 -7 

-1 

1 2 



-1 

-1 

2 

1 







1 

-1 
1 



4 

4 

-3 

11 

-1 

4 5 
-3 -2 



1 

-4 

2 



2 
6 

1 

-3 

2 





11 

4 
-4 
-3 



11 -11 



4 
-5 

-2 



-9 -14 

1 2 

2 -1 
-3 -2 

-8 



In the end we obtain 



0-i = 



2 
-1 

-2 



-1 


2 

3 



-3 
4 
2 
9 

-1 


2 
3 

1 

-2 

1 

-3 















1 








1 


1 


o" 


-2 





1 





-3 


1 



= [A, I] 



Verify directly that Q~ X A is in Hermite normal form. 

If A were non-singular, the Hermite normal form obtained would be the 
identity matrix. In this case Q~ x would be the inverse of A. This method 
of finding the inverse of a matrix is one of the easiest available for hand 
computation. It is the recommended technique. 



62 



Linear Transformations and Matrices I II 



EXERCISES 

1. Elementary operations provide the easiest methods for determining the rank 
of a matrix. Proceed as if reducing to Hermite normal form. Actually, it is not 
necessary to carry out all the steps as the rank is usually evident long before the 
Hermite normal form is obtained. Find the ranks of the following matrices : 



(a) 



(b) 



(c) 



'1 2 3 
4 5 6 
7 8 9 
' 1 
-1 
-2 -3 



"0 


1 


2" 


1 





3 


2 


3 






2. Identify the elementary operations represented by the following elementary 
matrices : 



(a) 



(b) 



(c) 



1 


-2 



"0 





r 





1 





1 









"i 





0" 





2 











1 



3. Show that the product 



"-1 0" 


_ 1 0" 


"1 


-r 


"1 0" 


1 


1 1 





i 


i i 



is an elementary matrix. Identify the elementary operations represented by each 
matrix in the product. 

4. Show by an example that the product of elementary matrices is not necessarily 
an elementary matrix. 



7 | Linear Problems and Linear Equations 



63 



5. Reduce each of the following matrices to Hermite normal form. 



(a) 



(b) 



"2 1 
2 -1 



3 -2' 
5 2 



1 1 1 

12 3 3 

2 10 

2 2 2 1 



1 

10 6" 

2 3 

5 5 



-113 2 5 2_ 

6. Use elementary row operations to obtain the inverses of 

(a) I" 3 -11 

—5 2 , and 

(b) Tl 2 3 

2 3 4 

3 4 6. 

7. (a) Show that, by using a sequence of elementary operations of type II only, 
any two rows of a matrix can be interchanged with one of the two rows multiplied 
by —1. (In fact, the type II operations involve no scalars other than ±1.) 

(b) Using the results of part (a), show that a type III operation can be obtained 
by a sequence of type II operations and a single type I operation. 

(c) Show that the sign of any row can be changed by a sequence of type II 
operations and a single type III operation. 

8. Show that any matrix A can be reduced to the form described in Theorem 4.1 
by a sequence of elementary row operations and a sequence of elementary column 
operations. 

7 I Linear Problems and Linear Equations 

For a given linear transformation a of U into V and a given p e V the 
problem of finding any or all |gU for which o(£) = /S is called a linear 
problem. Before providing any specific methods for solving such problems, 
let us see what the set of solutions should look like. 

If jS <£ cf(U), then the problem has no solution. 

If /? g a(U), the problem has at least one solution. Let £ °e one such 
solution. We call any such | a. particular solution. If I is any other solution, 
then <r(£ - | ) = <*(£ ) - <?(£o) = £ - P = so that f - f is in the kernel 
of a. Conversely, if £ — £„ is in the kernel of c then a (!) = <r(| + £ — £ ) = 
or(£ o ) + o-(| _ £ ) — £ + o = /S so that f is a solution. Thus the set of all 
solutions of (r(!) = /3 is of the form 

{f } + K{a). (7.1) 



64 Linear Transformations and Matrices | II 

Since {£ } contains just one element, there is a one-to-one correspondence 
between the elements of K{a) and the elements of {£ } + K{a). Thus the 
size of the set of solutions can be described by giving the dimension of K(a). 
The set of all solutions of the problem <r(£) = is not a subspace of U unless 
/8 = 0. Nevertheless, it is convenient to say that the set is of dimension v, 
the nullity of a. 

Given the linear problem <r(|) = 0, the problem <r(f) = is called the 
associated homogeneous problem. The general solution is then any particular 
solution plus the solution of the associated homogeneous problem. The 
solution of the associated homogeneous problem is the kernel of a. 

Now let o be represented by the m x n matrix A = [a tj ], be represented 
by B = (b u ... , bj, and £ by X = (x u . . . , x n ). Then the linear problem 
o(g) = f} becomes 

AX = B (7.2) 

in matrix form, or 

!>„*, = &„ (i=l,...,m) (7.3) 

in the form of a system of linear equations. 

Given A and 5, the augmented matrix [A, B] of the system of linear 
equations is defined to be 



[A, B] = 



«n - - " a ln b x 



(7.4) 



Theorem 7.1. The system of simultaneous linear equations AX = B has a 
solution if and only if the rank of A is equal to the rank of the augmented 
matrix [A, B]. Whenever a solution exists, all solutions can be expressed in 
terms of v = n — p independent parameters , where p is the rank of A. 

proof. We have already seen that the linear problem <r(|) = |8 has a 
solution if and only if e o(U). This is the case if and only if is linearly 
dependent on MoO, . . . , cr(a n )}. But this is equivalent to the condition 
that B be linearly dependent on the columns of A. Thus adjoining the 
column of b/s to form the augmented matrix must not increase the rank. 
Since the rank of the augmented matrix cannot be less than the rank of A 
we see that the system has a solution if and only if these two ranks are equal. 

Now let Q be a non-singular matrix such that Q^A = A' is in Hermite 
normal form. Any solution of AX = B is also a solution of A'X = Q~ X AX = 
Q~ X B = B'. Conversely, any solution of A'X = B' is also a solution of 
AX = QA'X = QB' = B. Thus the two systems of equations are equivalent. 



7 | Linear Problems and Linear Equations 



65 



Now the system A'X = B' is particularly easy to solve since the variable 
x k . appears only in the ith equation. Furthermore, non-zero coefficients 
appear only in the first p equations. The condition that /? e a(U) also 
takes on a form that is easily recognizable. The condition that B' be ex- 
pressible as a linear combination of the columns of A' is simply that the 
elements of B' below row p be zero. The system A'X = B' has the form 



+ ^l.fcj+i^fci+i +" • • + ai,fc 2+ i#fc 2+ i + 



+ a' 



2,fc 2 +l^fc 2 +l 



vCt. 






(7.5) 



Since each x k . appears in but one equation with unit coefficient, the remaining 
n — p unknowns can be given values arbitrarily and the corresponding 
values of the x k . computed. The n — p unknowns with indices not the k t 
are the n — p parameters mentioned in the theorem. □ 

As an example, consider the system of equations: 

5x x + 4x 2 + 3# 3 — x t = 4 

\\x x + 6x 2 + 4x 3 + # 4 = 11. 
The augmented matrix is 

' A 3 2-1 4 

5 4 3-14 

-2 -2 -1 2 -3 

11 6 4 1 11 

This is the matrix we chose for an example in the previous section. There 
we obtained the Hermite normal form 



1 








1 


1 





1 





-3 


2 








1 


2 


-3 


















Thus the system of equations A'X = B' corresponding to this augmented 
matrix is 

X \ + #4=1 

x % + 2z 4 = —3. 



66 Linear Transformations and Matrices | II 

It is clear that this system is very easy to solve. We can take any value 
whatever for x 4 and compute the corresponding values for x lt x 2 , and x z . 
A particular solution, obtained by taking x 4 = 0, is X = (1, 2, -3, 0). 
It is more instructive to write the new system of equations in the form 

x± = l — x 4 

%2 == *• i *%& 

X z i= — J — Zx 4 
%4 = X^ 

In vector form this becomes 

(a?!, x 2 , x z , Xi ) = (1,2, -3, 0) + * 4 (-l, 3, -2, 1). 

We can easily verify that (-1,3, -2, 1) is a solution of the associated 
homogeneous problem. In fact, {(-1, 3, -2, 1)} is a basis for the kernel, 
and * 4 (— 1, 3, —2, 1), for an arbitrary z 4 , is a general element of the kernel. 
We have, therefore, expressed the general solution as a particular solution 
plus the kernel. 

The elementary row operations provide us with the recommended technique 
for solving simultaneous linear equations by hand. This application is the 
principal reason for introducing elementary row operations rather than 
column operations. 

Theorem 7.2. The equation AX = B fails to have a solution if and only if 
there exists a one-row matrix C such that CA = and CB = 1. 

proof. Suppose the equation AX = B has a solution and a C exists 
such that CA = and CB = 1 . Then we would have = (CA)X = C(AX) = 
CB = 1 , which is a contradiction. 

On the other hand, suppose the equation AX = B has no solution. By 
Theorem 7.1 this implies that the rank of the augmented matrix [A, B] is 
greater than the rank of A. Let Q be a non-singular matrix such that 
<2 -1 l>4>-#] is in Hermite normal form. Thenif pis therankof,4, the (p + l)st 
row of Q~ X [A, B] must be all zeros except for a 1 in the last column. If C 
is the (p + l)st row of Q~ x this means that 

C[A,B] = [0 ••• 1], 
or 

CA = and CB = 1. □ 

This theorem is important because it provides a positive condition for a 
negative conclusion. Theorem 7.1 also provides such a positive condition 
and it is to be preferred when dealing with a particular system of equations. 
But Theorem 7.2 provides a more convenient condition when dealing with 
systems of equations in general. 

Although the sytems of linear equations in the exercises that follow are 
written in expanded form, they are equivalent in form to the matric equation 



7 | Linear Problems and Linear Equations 67 

AX = B. From any linear problem in this set, or those that will occur later, 
it is possible to obtain an extensive list of closely related linear problems 
that appear to be different. For example, if AX = B is the given linear 
problem with A an m x n matrix and Q is any non-singular m x m matrix, 
then A'X = B' with A' = QA and B' = QB is a problem with the same set of 
solutions. If Pis a non-singular n x n matrix, then A"X" = B where A" = AP 
is a problem whose solution X" is related to the solution X of the original 
problem by the condition X" = P- X X. 

For the purpose of constructing related exercises of the type mentioned, 
it is desirable to use matrices P and Q that do not introduce tedious numerical 
calculations. It is very easy to obtain a non-singular matrix P that has only 
integral elements and such that its inverse also has only integral elements. 
Start with an identity matrix of the desired order and perform a sequence of 
elementary operations of types II and III. As long as an operation of type I is 
avoided, no fractions will be introduced. Furthermore, the inverse opera- 
tions will be of types II and III so the inverse matrix will also have only 
integral elements. 

For convenience, some matrices with integral elements and inverses with 
integral elements are listed in an appendix. For some of the exercises that 
are given later in this book, matrices of transition that satisfy special con- 
ditions are also needed. These matrices, known as orthogonal and unitary 
matrices, usually do not have integral elements. Simple matrices of these 
types are somewhat harder to obtain. Some matrices of these types are also 
listed in the appendix. 

EXERCISES 

1. Show that {(1,1,1, 0), (2, 1, 0, 1)} spans the subspace of all solutions of the 
system of linear equations 

3x x — 2x 2 — x z — 4# 4 = 
x i + x 2 — 2x 3 — 3x 4 = 0. 

2. Find the subspace of all solutions of the system of linear equations 

x x + 2x 2 — 3*3 +x i = 
3x x — x 2 + 5x 3 — x i = 
2x x + x 2 a; 4 = 0. 

3. Find all solutions of the following two systems of non-homogeneous linear 
equations. 

(a) x x + 3x 2 + 5x 3 — 2x 4 = 11 
3*! — 2x 2 — 7x 3 + 5* 4 = 
2x x + x 2 + x 4 = 7, 

(b) x x + 3x 2 + 2x 3 + 5x 4 = 10 
3x x — 2x 2 — 5x 3 + 4x i = —5 
2x x + x 2 - x 3 + 5x 4 = 5. 



68 Linear Transformations and Matrices | II 

4. Find all solutions of the following system of non-homogeneous linear equations 

ZtX-t *^2 "^3 — ■*■ 

Ju i ~ ~ JC o ~T~ £•& o — — "" ~~ ^ 
^tJU-% J&q i~ *& q — — *" "" J 

5. Find all solutions of the system of equations, 

7x 1 + 3x 2 + 21^3 — \3x i + x 5 = —14 

lOo^ + 3x 2 + 3Cte 3 — 16x 4 + £ 5 = —23 

7x 1 + 2« 2 + 21x 3 — llx 4 + x 5 = —16 

9^! + 3z 2 + 27ic 3 - 15a; 4 + x 5 = -20. 

6. Theorem 7.1 states that a necessary and sufficient condition for the existence 
of a solution of a system of simultaneous linear equations is that the rank of the 
augmented matrix be equal to the rank of the coefficient matrix. The most efficient 
way to determine the rank of each of these matrices is to reduce each to Hermite 
normal form. The reduction of the augmented matrix to normal form, however, 
automatically produces the reduced form of the coefficient matrix. How, and 
where? How is the comparison of the ranks of the coefficient matrix and the 
augmented matrix evident from the appearance of the reduced form of the aug- 
mented matrix ? 

7. The differential equation d 2 y/dx 2 + Ay = sin x has the general solution 
y = C x sin 2x + C 2 cos 2x + | sin x. Identify the associated homogeneous prob- 
lem, the solution of the associated homogeneous problem, and the particular solu- 
tion. 

8 I Other Applications of the Hermite Normal Form 

The Hermite normal form and the elementary row operations provide 
techniques for dealing with problems we have already encountered and 
handled rather awkwardly. 

A Standard Basis for a Subspace 

Let A = {a l5 . . . , aj be a basis of U and let W be a subspace of U spanned 
by the set B = {/? l5 . . . , /?,.}. Since every subspace of U is spanned by a finite 
set, it is no restriction to assume that B is finite. Let & = 2"=i ^a^i s0 tnat 
(b a , . . . , b in ) is the «-tuple representing fa. Then in the matrix B = [b^] 
each row is the representation of a vector in B. Now suppose an elementary 
row operation is applied to B to obtain B'. Every row of B' is a linear com- 
bination of the rows of B and, since an elementary row operation has an 
inverse, every row of B is a linear combination of the rows of B' . Thus the 
rows of B and the rows of B' represent sets spanning the same subspace W. 
We can therefore reduce B to Hermite normal form and obtain a particular 
set spanning W. Since the non-zero rows of the Hermite normal form are 
linearly independent, they form a basis of W. 



8 | Other Applications of the Hermite Normal Form 69 

Now let C be another set spanning W. In a similar fashion we can con- 
struct a matrix C whose rows represent the vectors in C and reduce this 
matrix to Hermite normal form. Let C" be the Hermite normal form 
obtained from C, and let B' be the Hermite normal form obtained from B. 
We do not assume that B and C have the same number of elements, and there- 
fore B' and C do not necessarily have the same number of rows. However, 
in each the number of non-zero rows must be equal to the dimension of W. 
We claim that the non-zero rows in these two normal forms are identical. 

To see this, construct a new matrix with the non-zero rows of C" written 
beneath the non-zero rows of B' and reduce this matrix to Hermite normal 
form. Since the rows of C are dependent on the rows of B', the rows of C" 
can be removed by elementary operations, leaving the rows of B'. Further 
reduction is not possible since B' is already in normal form. But by inter- 
changing rows, which are elementary operations, we can obtain a matrix in 
which the non-zero rows of B' are beneath the non-zero rows of C". As 
before, we can remove the rows of B' leaving the non-zero rows of C as 
the normal form. Since the Hermite normal form is unique, we see that the 
non-zero rows of B' and C are identical. The basis that we obtain from the 
non-zero rows of the Hermite normal form is the standard basis with respect 
to A for the subspace W. 

This gives us an effective method for deciding when two sets span the 
same subspace. For example, in Chapter 1-4, Exercise 5, we were asked to 
show that {(1, 1,0, 0), (1, 0, 1, 1)} and {(2, -1, 3, 3), (0, 1, -1, -1)} span 
the same space. In either case we obtain {(1, 0, 1, 1), (0, 1, —1, —1)} as the 
standard basis. 

The Sum of Two Subspaces 

If A x is a subset spanning W x and A 2 is a subset spanning VV 2 , then A x U A 2 
spans W 1 + W 2 (Chapter I, Proposition 4.4). Thus we can find a basis for 
W x + VV 2 by constructing a large matrix whose rows are the representations 
of the vectors in A x u A 2 and reducing it to Hermite normal form by ele- 
mentary row operations. 

The Characterization of a Subspace by a Set of Homogeneous 
Linear Equations 

We have already seen that the set of all solutions of a system of homo- 
geneous linear equations is a subspace, the kernel of the linear transformation 
represented by the matrix of coefficients. The method for solving such a 
system which we described in Section 7 amounts to passing from a charac- 
terization of a subspace as the set of all solutions of a system of equations 
to its description as the set of all linear combinations of a basis. The question 



70 Linear Transformations and Matrices | II 

naturally arises: If we are given a spanning set for a subspace W, how can 
we find a system of simultaneous homogeneous linear equations for which W 
is exactly the set of solutions ? 

This is not at all difficult and no new procedures are required. All that 
is needed is a new look at what we have already done. Consider the homo- 
geneous linear equation a x x x -\ + a n x n = 0. There is no significant 

difference between the a/s and the s/s in this equation ; they appear sym- 
metrically. Let us exploit this symmetry systematically. 

If a 1 x 1 + • • • + a n x n = and b x x x + \- b n x n = are two homo- 
geneous linear equations then (a x + b x )x x + • • • + (a n + b n )x n = is a 
homogeneous linear equation as also is aa 1 x x + • • • + aa n x n = where 
a e F. Thus we can consider the set of all homogeneous linear equations in 
n unknowns as a vector space over F. The equation a x x x + • • • + a n x n = 
is represented by the «-tuple (a x , . . . , a n ). 

When we write a matrix to represent a system of equations and reduce that 
matrix to Hermite normal form we are finding a standard basis for the sub- 
space of the vector space of all homogeneous linear equations in x x , . . . , x n 
spanned by this system of equations just as we did in the first part of this 
section for a set of vectors spanning a subspace. The rank of the system of 
equations is the dimension of the subspace of equations spanned by the given 
system. 

Now let W be a subspace given by a spanning set and solve for the subspace 
£ of all equations satisfied by W. Then solve for the subspace of solutions of 
the system of equations £. W must be a subspace of the set of all solutions. 
Let W be of dimension v. By Theorem 7.1 the dimension of £ is n — v. 
Then, in turn, the dimension of the set of all solutions of £ is n — (n — v) = v. 
Thus W must be exactly the space of all solutions. Thus W and £ characterize 
each other. 

If we start with a system of equations and solve it by means of the Hermite 
normal form, as described in Section 7, we obtain in a natural way a basis 
for the subspace of solutions. This basis, however, will not be the standard 
basis. We can obtain full symmetry between the standard system of equations 
and the standard basis by changing the definition of the standard basis. 
Instead of applying the elementary row operations by starting with the left- 
hand column, start with the right-hand column. If the basis obtained in this 
way is called the standard basis, the equations obtained will be the standard 
equations, and the solution of the standard equations will be the standard 
basis. In the following example the computations will be carried out in this 
way to illustrate this idea. It is not recommended, however, that this be 
generally done since accuracy with one definite routine is more important. 

Let 

W = <(1, 0, -3, 11, -5), (3, 2, 5, -5, 3), (1, 1,2, -4, 2), (7, 2, 12, 1, 2)>. 



8 | Other Applications of the Hermite Normal Form 
We now find a standard basis by reducing 



71 



-3 

2 5 



to the form 



2 
12 

5 
2 



11 

-5 

-4 

1 

1 

1 



-5 
3 
2 
2 









From this we see that the coefficients of our systems of equations satisfy the 
conditions 

2a x + 5a z + a 5 = 

a x + 2a z + a 4 =0 

a x + a 2 = 0. 

The coefficients a x and a z can be selected arbitrarily and the others computed 
from them. In particular, we have 

(a x , a 2 , a 3 , a 4 , a b ) = a x (l, -1, 0, -1, -2) + a 3 (0, 0, 1, -2, -5). 

The 5-tuples (1,-1,0,-1,-2) and (0,0, 1, -2, -5) represent the two 
standard linear equations 

x l ~ x 2 ~ x i — 2x 5 = 

The reader should check that the vectors in W actually satisfy these equations 
and that the standard basis for W is obtained. 

The Intersection of Two Subspaces 

Let W x and W 2 be subspaces of U of dimensions v x and i> 2 , respectively, 
and let W x n VV 2 be of dimension v. Then W x + VV 2 is of dimension 
v i + v 2 — v. Let E x and £ 2 be the spaces of equations characterizing W x 
and W 2 . As we have seen E x is of dimension n — v x and £ 2 is of dimension 
n — v 2 . Let the dimension of E x + £ 2 be />. Then E x n £ 2 is of dimension 
(« - v x ) + (n - v 2 ) - p = 2n - v x - v 2 - p. 

Since the vectors in W x n W 2 satisfy the equations in both £ x and £ 2 , 
they satisfy the equations in E x + £ 2 . Thus v < n — p. On the other hand, 



72 Linear Transformations and Matrices | II 

W x and VV 2 both satisfy the equations in E x n £ 2 so that W 1 + VV 2 satisfies the 
equations in £ x n £ 2 . Thus v x + r 2 — v < n — {2n — v x — v 2 — p) = 
v i + ^2 + /° — «• A comparison of these two inequalities shows that 
v — n — p and hence that W x n VV 2 is characterized by £ x + £ 2 . 

Given W x and W 2 , the easiest way to find W x n W 2 is to determine £ x and 
£ 2 and then £ x + £ 2 - From £ x + £ 2 we can then find W x n VV 2 . In effect, 
this involves solving three systems of equations, and reducing to Hermit 
normal form three times, but it is still easier than a direct assault on the 
problem. 

As an example consider Exercise 8 of Chapter 1-4. Let W x = ((1,2, 3, 6), 
(4,-1,3,6), (5, 1,6, 12)) and W 2 = ((1,-1,1,1), (2,-1,4,5)). Using 
the Hermite normal form, we find that £ x = <(— 2, — 2, 0, 1), (— 1, — 1, 1,0)) 
and £ 2 = <(-4, -3,0, 1), (-3,-2,1,0)). Again, using the Hermite 
normal form we find that the standard basis for Ej + £ 2 is {(1, 0, 0, £), 
(0,1,0,-1), (0,0,1,— |)}. And from this we find quite easily that, 
W,nW 2 = ((-i,l, i, 1)>. 

Let B = {/?!, ^ 2 , . . . , (5 n ) be a given finite set of vectors. We wish to 
solve the problem posed in Theorem 2.2 of Chapter I. How do we show that 
some fi k is a linear combination of the & with i < k; or how do we show that 
no ft k can be so represented ? 

We are looking for a relation of the form 

& i 

P* = I ***&• « (8.1) 

This is not a meaningful numerical problem unless ^ is a given specific set. 
This usually means that the & are given in terms of some coordinate system, 
relative to some given basis. But the relation (8.1) is independent of any 
coordinate system so we are free to choose a different coordinate system if this 
will make the solution any easier. It turns out that the tools to solve this 
problem are available. 

Let A = {<*!, . . . , a TO } be the given basis and let 

m 

Pi = Z a » a *' ;' = 1, . . . , n. (8.2) 

If A' = {a{, . . . , a' m } is the new basis (which we have not specified yet), 
we would have 

m 

Pi = 2 a 'n^ j =1,. . . ,n. (8.3) 

i=l 

What is the relation between A = [a tj ] and A' = [a' i} ]7 If P is the matrix of 
transition from the basis A to the basis A', by formula (4.3) we see that 

A = PA'. (8.4) 



8 | Other Applications of the Hermite Normal Form 73 

Since P is non-singular, it can be represented as a product of elementary 
matrices. This means A' can be obtained from A by a sequence of elementary 
row operations. 

The solution to (8.1) is now most conveniently obtained if we take A' to be 
in Hermite normal form. Suppose that A' is in Hermite normal form and 
use the notation given in Theorem 5.1. Then, for P k . we would have 

Pki = <4 (8.5) 

and for j between k r and k r+1 we would have 

r 

Pi = 2 a i>< 



= 2 a 'iAi 



(8.6) 



Since k t < k r < j, this last expression is a relation of the required form. 
(Actually, every linear relation that exists among the j8 < can be obtained 
from those in (8.6). This assertion will not be used later in the book so we 
will not take space to prove it. Consider it "an exercise for the reader.") 

Since the columns of A and A' represent the vectors in 8, the rank of A is 
equal to the number of vectors in a maximal linearly independent subset of 8. 
Thus, if 8 is linearly independent the rank of A will be n, this means that the 
Hermite normal form of A will either show that 6 is linearly independent 
or reveal a linear relation in 8 if it is dependent. 

For example, consider the set {(1,0,-3,11,-5), (3,2,5,-5, 3), 
(1, 1,2, —4, 2), (7, 2, 12, 1, 2)}. The implied context is that a basis A = 
{a l5 . . . , a 5 } is considered to be given and that & = a x — 3a 3 + lla 4 — 5a 5 
etc. According to (8.2) the appropriate matrix is 

"1-3 1 T 



11 -5 -4 1 
-5 3 2 2_ 
which reduces to the Hermite normal form 




-3 



2 
12 



"1 








3" 
4 





1 





2 3 
4" 








1 


a 9 

2" 



























74 Linear Transformations and Matrices | II 

It is easily checked that — 1(1, 0, -3, 11, -5) + - 2 T 3 -(3, 2, 5, -5, 3) - 
-V 9 -0,l,2, -4, 2) = (7, 2, 12, 1,2). 



EXERCISES 

1 . Determine which of the following set in R 4 are linearly independent over R. 

(a) {(1,1,0,1), (1, -1,1,1), (2,2, 1,2), (0,1,0,0)}. 

(b) {(1, 0, 0, 1), (0, 1, 1, 0), (1, 0, 1, 0), (0, 1, 0, 1)}. 

(c) {(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1), (1, 1, 1, 1)}. 

This problem is identical to Exercise 8, Chapter 1-2. 

2. Let W be the subspace of R 5 spanned by {(1,1,1,1,1), (1,0,1,0,1), 
(0,1,1,1,0), (2,0,0,1,1), (2,1,1,2,1), (1, -1, -1, -2,2), (1,2,3,4, -1)}. 
Find a standard basis for W and the dimension of W. This problem is identical 
to Exercise 6, Chapter 1-4. 

3. Show that {(1, -1,2, -3), (1,1,2,0), (3, -1,6, -6)} and {(1,0,1,0), 
(0, 2, 0, 3)} do not span the same subspace. This problem is identical to Exercise 7, 
Chapter 1-4. 

4. If W x = <(1, 1, 3, -1), (1, 0, -2, 0), (3, 2, 4, -2)> and W 2 = <(1, 0, 0, 1), 
(1, 1,7, 1)> determine the dimension of W x + W 2 . 

5. Let W = <(1, -1, -3, 0, 1), (2, 1,0, -1,4), (3, 1,-1,1, 8), (1, 2, 3, 2, 6)>. 
Determine the standard basis for W. Find a set of linear equations which char- 
acterize W. 

6. Let W x = <(1, 2, 3, 6), (4, -1, 3, 6), (5, 1, 6, 12)> and W 2 = <(1, -1, 1, 1), 
(2, - 1 , 4, 5)> be subspaces of R 4 . Find bases for W x n W 2 and W x + VV 2 . Extend 
the basis of VV 2 n W 2 to a basis of W x and extend the basis of W 1 n W 2 to a basis 
of VV 2 . From these bases obtain a basis of W x + W 2 . This problem is identical 
to Exercise 8, Chapter 1-4. 

9 I Normal Forms 

To understand fully what a normal form is, we must first introduce the 
concept of an equivalence relation. We say that a relation is defined in a 
set if, for each pair {a, b) of elements in this set, it is decided that "a is 
related to Z>" or "a is not related to 6." If a is related to b, we write a~ b. 
An equivalence relation in a set S is a relation in S satisfying the following laws : 

Reflexive law: a~ a, 

Symmetric law: If a <~ b, then b <~ a. 

Transitive law : If a ~ b and b ~ c, then a ^ c. 

If for an equivalence relation we have a ~ b, we say that a is equivalent to b. 



9 | Normal Forms 75 

Examples. Among rational fractions we can define a\b ~ c\d (for a, b,c,d 
integers) if and only if ad = be. This is the ordinary definition of equality 
in rational numbers, and this relation satisfies the three conditions of an 
equivalence relation. 

In geometry we do not ordinarily say that a straight line is parallel to 
itself. But if we agree to say that a straight line is parallel to itself, the 
concept of parallelism is an equivalence relation among the straight lines 
in the plane or in space. 

Geometry has many equivalence relations: congruence of triangles, 
similarity of triangles, the concept of projectivity in projective geometry, 
etc. In dealing with time we use many equivalence relations: same hour 
of the day, same day of the week, etc. An equivalence relation is like a 
generalized equality. Elements which are equivalent share some common 
or underlying property. As an example of this idea, consider a collection of 
sets. We say that two sets are equivalent if their elements can be put into 
a one-to-one correspondence; for example, a set of three battleships and a 
set of three cigars are equivalent. Any set of three objects shares with 
any other set of three objects a concept which we have abstracted and called 
"three." All other qualities which these sets may have are ignored. 

It is most natural, therefore, to group mutually equivalent elements 
together into classes which we call equivalence classes. Let us be specific 
about how this is done. For each a e S, let S a be the set of all elements in 
S equivalent to a; that is, b e S a if and only if b ~ a. We wish to show that 
the various sets we have thus defined are either disjoint or identical. 

Suppose S n S b is not empty; that is, there exists a cES a nS b such 
that c r^j a and c ~ b. By symmetry b <~ c, and by transitivity b — a. 
If d is any element of S 6 , d^b and hence d~ a. Thus d eS a and S 6 <= S a . 
Since the relation between S and S b is symmetric we also have S <=■ S b and 
hence S a = S b . Since a eS u we have shown, in effect, that a proposed 
equivalence class can be identified by any element in it. An element selected 
from an equivalence class will be called a representative of that class. 

An equivalence relation in a set S defines a partition of that set into 
equivalence classes in the following sense: (1) Every element of S is in some 
equivalence class, namely, a e S . (2) Two elements are in the same equiva- 
lence class if and only if they are equivalent. (3) Non-identical equivalence 
classes are disjoint. On the other hand, a partition of a set into disjoint 
subsets can be used to define an equivalence relation; two elements are 
equivalent if and only if they are in the same subset. 

The notions of equivalence relations and equivalence classes are not 
nearly so novel as they may seem at first. Most students have encountered 
these ideas before, although sometimes in hidden forms. For example, 
we may say that two differentiable functions are equivalent if and only if 



76 Linear Transformations and Matrices | II 

they have the same derivative. In calculus we use the letter "C" in describing 
the equivalence classes; for example, x 3 + # 2 + 2z + C is the set (equiva- 
lence class) of all functions whose derivative is 3x 2 + 2x + 2. 

In our study of matrices we have so far encountered four different equiva- 
lence relations: 

I. The matrices A and B are said to be left associate if there exists a 
non-singular matrix Q such that B = Q~ X A. Multiplication by Q~ x corre- 
sponds to performing a sequence of elementary row operations. If A 
represents a linear transformation a of U into V with respect to a basis A in U 
and a basis B in V, the matrix B represents a with respect to A and a new basis 
in V. 

II. The matrices A and B are said to be right associate if there exists a 
non-singular matrix P such that B = AP. 

III. The matrices A and B are said to be associate if there exist non- 
singular matrixes P and Q such that B = Q~ X AP. The term "associate" is 
not a standard term for this equivalence relation, the term most frequently 
used being "equivalent." It seems unnecessarily confusing to use the same 
term for one particular relation and for a whole class of relations. Moreover, 
this equivalence relation is perhaps the least interesting of the equivalence 
relations we shall study. 

IV. The matrices A and B are said to be similar if there exists a non- 
singular matrix P such that B = P~ X AP. As we have seen (Section 4) similar 
matrices are representations of a single linear transformation of a vector 
space into itself. This is one of the most interesting of the equivalence 
relations, and Chapter III is devoted to a study of it. 

Let us show in detail that the reation we have defined as left associate 
is an equivalence relation. The matrix Q~ x appears in the definition because 
Q represents the matrix of transition. However, Qr x is just another 
singular matrix, so it is clearly the same thing to say that A and B are left 
associate if and only if there exists a non-singular matrix Q such that B = QA. 

(1) A — ' A since IA = A. 

(2) If A ~ B, there is a non-singular matrix Q such that B = QA. But 
then A = Q~ X B so that B ~ A. 

(3) If A r^j B and B ~ C, there exist non-singular matrices Q and P 
such that B = QA and C = PB. But then PQA = PB = C and PQ is 
non-singular so that A ~ C. 

For a given type of equivalence relation among matrices a normal form 
is a particular matrix chosen from each equivalence class. It is a repre- 
sentative of the entire class of equivalent matrices. In mathematics the 
terms "normal" and "canonical" are frequently used to mean "standard" 
in some particular sense. A normal form or canonical form is a standard 



9 | Normal Forms 77 

form selected to represent a class of equivalent elements. A normal form 
should be selected to have the following two properties : Given any matrix 
A, (1) it should be possible by fairly direct and convenient methods to find 
the normal form of the equivalence class containing A, and (2) the method 
should lead to a unique normal form. 

Often the definition of a normal form is compromised with respect to the 
second of these desirable properties. For example, if the normal form were 
a matrix with complex numbers in the main diagonal and zeros elsewhere, 
to make the normal form unique it would be necessary to specify the order 
of the numbers in the main diagonal. But it is usually sufficient to know 
the numbers in the main diagonal without regard to their order, so it would 
be an awkward complication to have to specify their order. 

Normal forms have several uses. Perhaps the most important use is that 
the normal form should yield important or useful information about the 
concept that the matrix represents. This should be amply illustrated in 
the case of the concept of left associate and the Hermite normal form. We 
introduced the Hermite normal form through linear transformations, but 
we found that it yielded very useful information when the matrix was used 
to represent linear equations or bases of subspaces. 

Given two matrices, we can use the normal form to tell whether they are 
equivalent. It is often easier to reduce each to normal form and compare 
the normal forms than it is to transform one into the other. This is the case, 
for example, in the application described in the first part of Section 8. 

Sometimes, knowing the general appearance of the normal form, we can 
find all the information we need without actually obtaining the normal 
form. This is the case for the equivalence relation we have* called associate. 
The normal form for this equivalence relation is described in Theorem 4.1. 
T*^re is just one normal form for each possible value of the rank. The 
lia'mber of different equivalence classes is min {m, n} + 1. With this notion 
of equivalence the rank of a matrix is the only property of importance. Any 
two matrices of the same rank are associate. In practice we can find the rank 
without actually computing the normal form of Theorem 4.1. And knowing 
the rank we know the normal form. 

We encounter several more equivalence relations among matrices. The 
type of equivalence introduced will depend entirely on the underlying con- 
cepts the matrices are used to represent. It is worth mentioning that for the 
equivalence relations we introduce there is no necessity to prove, as we did 
for an example above, that each is an equivalence relation. An underlying 
concept will be defined without reference to any coordinate system or choice 
of basis. The matrices representing this concept will transform according 
to certain rules when the basis is changed. Since a given basis can be retained 
the relation defined is reflexive. Since a basis changed can be changed back 



For example, in R 2 the matrix 
-1 0" 
^-axis and 



78 Linear Transformations and Matrices | II 

to the original basis, the relation defined is symmetric. A basis changed once 
and then changed again depends only on the final choice so that the relation is 
transitive. 

For a fixed basis A in U and 8 in V two different linear transformations 
a and t of U into V are represented by different matrices. If it is possible, 
however, to choose bases A' in U and 8' in V such that the matrix representing 
r with respect to A' and 8' is the same as the matrix representing a with 
respect to A and 8, then it is certainly clear that a and t share important 
geometric properties. 

For a fixed a two matrices A and A' representing a with respect to different 
bases are related by a matrix equation of the form A' = Q~ X AP. Since 
A and A' represent the same linear transformation we feel that they should 
have some properties in common, those dependent upon a. 

These two points of view are really slightly different views of the same 
kind of relationship. In the second case, we can consider A and A' as 
representing two linear transformations with respect to the same basis, 
instead of the same linear transformation with respect to different bases. 

"1 01 

represents a reflection about the 
-lj 
represents a reflection about the # 2 -axis. When both 

1. 

linear transformations are referred to the same coordinate system they are 
different. However, for the purpose of discussing properties independent 
of a coordinate system they are essentially alike. The study of equivalence 
relations is motivated by such considerations, and the study of normal forms 
is aimed at determining just what these common properties are that are 
shared by equivalent linear transformations or equivalent matrices. 

To make these ideas precise, let a and r be linear transformations of V 
into itself. We say that a and t are similar if there exist bases A and 8 of V 
such that the matrix representing a with respect to A is the same as the matrix 
representing t with respect to 8. If A and B are the matrices representing 
a and t with respect to A and P is the matrix of transition from A to 8, then 
P~ X BP is the matrix representing r with respect to 8. Thus a and r are 
similar if P^BP = A. 

In a similar way we can define the concepts of left associate, right associate, 
and associate for linear transformations. 

*10 I Quotient Sets, Quotient Spaces 

Definition. If S is any set on which an equivalence relation is defined, the 
collection of equivalence classes is called the quotient or factor set. Let S 
denote the quotient set. An element of S is an equivalence class. If a is an 



10 I Quotient Sets, Quotient Spaces 79 

element of S and a is the equivalence class containing a, the mapping t] that 
maps a onto a is well defined. This mapping is called the canonical mapping. 

Although the concept of a quotient set might appear new to some, it is 
certain that almost everyone has encountered the idea before, perhaps in 
one guise or another. One example occurs in arithmetic. In this setting, 
let S be the set of all formal fractions of the form afb where a and b are 
integers and b ^ 0. Two such fractions, ajb and c/d, are equivalent if and 
only if ad = be. Each equivalence class corresponds to a single rational 
number. The rules of arithmetic provide methods of computing with rational 
numbers by performing appropriate operations with formal fractions selected 
from the corresponding equivalence classes. 

Let U be a vector space over F and let K be a subspace of U. We shall call 
two vectors a, p e U equivalent modulo K if and only if their difference lies 
in K. Thus a /^- < p if and only if a — p e K. We must first show this defines 
an equivalence relation. (1) a ~ a because a — a = e K. (2)a^/?=> 
a — p 6 K => p — a e K => p ~ a. (3) {a ~ p and p ~ y} => {a - e K and 
p — y eK}. Since K is a subspace a — 7 = (a — /?) + (0 — y) e K and, 
hence, a ~ y. Thus "~" is an equivalence relation. 

We wish to define vector addition and scalar multiplication in U. For 
a e U, let SeU denote the equivalence class containing a. a is called a 
representative of a. Since a may contain other elements besides a, it may 
happen that a^a' and yet a = a'. Let a and p be two elements in U. Since 
a, /? e U, a + jS is defined. We wish to define a + /? to be the sum of a and /5. 
In order for this to be well defined we must end up with the same equivalence 
class as the sum if different representatives are chosen from a and /?. Suppose 
a = a' a nd,g = /?' . Then a - a' £ K, - ? e K, and (a + $) - (a' + ?) £ 
/C. Thus a + = a' + /?' and the sum is well defined. Scalar multiplication 
is defined similarly. For aeF, aa. is thus defined to be the equivalence class 
containing ace; that is, aa = aa These operations in (Jare said to be induced 
by the corresponding operation in U. 

Theorem 10.1. If U is a vector space over F, and K is a subspace of U, 
the quotient set U with vector addition and scalar multiplication defined as 
above is a vector space over F. 

proof. We leave this as an exercise. □ 

For any a £ U, the symbol a + K is used to denote the set of all elements 
in U that can be written in the form a + y where y £ K. (Strictly speaking, 
we should denote the set by {a} + K so that the plus sign combines two objects 
of the same type. The notation introduced here is traditional and simpler.) 
The set a + K is called a coset of K. If /? £ a + K, then p — a £ K and 



80 Linear Transformations and Matrices | II 

j8 ~ a. Conversely, if ~ a, then £ - a = y e K so /S e a + K. Thus 
a + K is simply the equivalence class a containing a. Thus a + K = /? + K 
if and only ifae£ = £ + Kor£ea = a + K. 

The notation a + K to denote a is convenient to use in some calcula tions. 
For example, a + = (a + K) + (0 + K) = a + + K = a + 0, and 
aa = a(a + K) = aa + aK c fla + K = tfa. Notice that aa. = act. when a 
and ool are considered to be elements of U and scalar mutliplication is the 
induced operation, but that ail and ad. may not be the same when they are 
viewed as subsets of U (for example, let a = 0). However, since acx c: ace 
the set acl determines the desired coset in U for the induced operations. Thus 
we can compute effectively in U by doing the corresponding operations with 
representatives. This is precisely what is done when we compute in residue 
classes of integers modulo an integer m. 

Definition. U with the induced operations is called a factor space or quotient 
space. In order to designate the role of the subspace K which defines the 
equivalence relations, U is usually denoted by U/K. 

In our discussion of solutions of linear problems we actually encountered 
quotient spaces, but the discussion was worded in such a way as to avoid 
introducing this more sophisticated concept. Given the linear transformation 
a of U into V, let K be the kernel of a and let U = U/K be the corresponding 
quotient space. If a x and a 2 are solutions of the linear problem, oc(f) = 0, 
then a(a. x — <x 2 ) = so that a x and a 2 are in the same coset of K. Thus 
for each fi e lm(a) there corresponds precisely one coset of K. In fact 
the correspondence between U/K and lm(a) is an isomorphism, a fact which 
is made more precise in the following theorem. 

Theorem 10.2. (First homomorphism theorem). Let a be a linear trans- 
formation ofU into V. Let K be the kernel of a. Then a can be written as the 
product of a canonical mapping rj of U onto = U/K and a monomorphism 

a x of U into V. 

proof. The canonical mapping r\ has already been defined. To define 
a lf for each a e U let (7 x (a) = cr(a) where a is any representative of a. Since 
(7(a) = (r(a') for a~a', a x is well defined. It is easily seen that a x is a 
monomorphism since a must have different values in different cosets. D 

The homomorphism theorem is usually stated by saying, "The homo- 
morphic image is isomorphic to the quotient space of U modulo the kernel." 

Theorem 10.3. (Mapping decomposition theorem). Let a be a linear 
transformation ofil into V. Let K be the kernel of a and I the image of a. Then 
a can be written as the product a = ia x rj, where rj is the canonical mapping of 



10 I Quotient Sets, Quotient Spaces 81 

U onto U = UjK, a 1 is an isomorphism of U onto I, and i is the injection of \ 
into V. 

proof. Let a' be the linear transformation of U onto / induced by re- 
stricting the codomain of a to the image of a. By Theorem 10.2, a' can be 
written in the form a' = a^. □ 

Theorem 10.4. {Mapping factor theorem). Let S be a subspace of U and 
let U = U/S be the resulting quotient space. Let a be a linear transformation of 
U into V, and let K be the kernel of a. If S <= K, then there exists a linear 
transformation a x of U into V such that a = a x r\ where -n is the canonical 
mapping of U onto U. 

proof. For each oteU, let cr^a) = c(a) where a e a. If a' is another 
representative of a, then a — a' £ S <= K. Thus <r(a) = cT(a') and a x is well 
defined. It is easy to check that g x is linear. Clearly, <r(<x) = o^a) = ^(^(a)) 
for all a e U, and a = a x rj. □ 

We say that a factors through (J. 

Note that the homomorphism theorem is a special case of the factor theorem 
in which K = S. 

Theorem 10.5. {Induced mapping theorem). Let U and V be vector spaces 
over F, and let r be a linear transformation of (J into V. Let U be a subspace of 
U and let V be a subspace of V. If t{U ) <= V , // is possible to define in a 
natural way a mapping f of U/U into V/V such that o 2 t = fa x where a x is the 
canonical mapping U onto U and a 2 is the canonical mapping of V onto V. 

proof. Consider a = a 2 r, which maps U into V. The kernel of a is t -1 (V ). 
By assumption, U <= r _1 (V ). Hence, by the mapping factor theorem, there 
is a linear transformation f such that fa x = a 2 r. □ 

We say that f is induced by r. 

Numerical calculations with quotient spaces can usually be avoided in 
problems involving finite dimensional vector spaces. If U is a vector space 
over F and K is a subspace of U, we know from Theorem 4.9 of Chapter I 
that K is a direct summand. Let U = K ® W. Then the canonical mapping 
r\ maps W isomorphically onto U/K. Thus any calculation involving UjK 
can be carried out in W. 

Although there are many possible choices for the complementary subspace 
W, the Hermite normal form provides a simple and effective way to select a W 
and a basis for it. This typically arises in connection with a linear problem. 
To see this, reexamine the proof of Theorem 5.1. There we let k ± , k 2 , . . . , k p 
be those indices for which <r(a fc .) ^ a{U kil ). We showed there that {(![, . . . , 
fi' k } where ^ = cr(a fc .) formed a basis of a{U). {a &i , a^, . . . , <x k J is a basis for 
a suitable W which is complementary to K{a). 



82 



Linear Transformations and Matrices I II 



Example. Consider the linear transformation a of R 5 into R 3 represented by 
the matrix 

1 1 1" 



It is easy to determine that the kernel K of a is 2-dimensional with basis 
{(1, —1, — 1, 1,0), (0, 0, —1,0, 1)}. This means that a has rank 3 and the 
image of a is all of R 3 . Thus R 5 = R 5 /K is isomorphic to R 3 . 

Consider the problem of solving the equation c(£) = /?, where jS is 
represented by (b x , b 2 , b 3 ). To solve this problem we reduce the augmented 
matrix 







"1 1 1 b x 






1 1 b 2 






1 0-1 b 3 




to the Hermite normal form 




"1 -1 b 3 




1 1 b 2 







1 1 1 b x - I 


K 



This means the solution £ is represented by 

(b 3 ,b 2 ,b x -b 3 ,0,0) + x i (l, -l,-l,l,0) + z 5 (0, 0,-1,0,1). 

(ft,, b t , b x - b 3 , 0, 0) = M0, 0,1,0,0) + 6,(0, 1,0,0,0) + b 3 (\ ,0,-1,0,0) 

is a particular solution and a convenient basis for a subspace W complementary 
to K is {(0, 0, 1,0, 0), (0, 1, 0, 0, 0), (1, 0, -1, 0, 0)}. a maps Z> x (0, 0, 
1, 0, 0) + b 2 (0, 1, 0, 0, 0) + b 3 (l, 0, -1, 0, 0) onto (b lt b 2 , b 3 ). Hence, W 
is mapped isomorphically onto R 3 . 

This example also provides an opportunity to illustrate the working of the 

first homomorphism theorem. For any (x x , x 2 , x 3 , x A , x 5 ) e R 5 . 

- i 

(*!, X 2 , X 3 , Xi Hr X b ) = (X X + X 3 + Xs)(0, 0, 1 , 0, 0) 
+ (*2 + *4)(0, 1,0,0,0)' 

+ (^-^(1, 0,-1,0,0) 

+ * 4 (1, -1, -l,l,0) + a; 5 (0, 0,-1,0,1). 

Thus {x x , x 2 , x 3 , x 4 , x 5 ) is mapped onto the coset (x x + x 3 + x 5 )(0, 0, 1, 0, 0) + 
(x 2 + a; 4 )(0, 1, 0, 0, 0) + (x x - a; 4 )(l, 0, -1, 0, 0) + K under the natural 
homomorphism onto R 5 /K. This coset is then mapped isomorphically onto 
{x x + x 3 + x h , x 2 + x x , x x — x y ) e R 3 . However, it is somewhat contrived to 



ll|Hom(U,V) 83 

work out an example of this type. The main importance of the first homo- 
morphism theorem is theoretical and not computational. 

*11 I Hom(U, V) 

Let U and V be vector spaces over F. We have already observed in Section 1 
that the set of all linear transformations of U into Vcan be made into a vector 
space over F by defining addition and scalar multiplication appropriately. 
In this section we will explore some of the elementary consequences of this 
observation. We shall call this vector space Hom(U, V), "The space of all 
homomorphisms of U into V." 

Theorem 11.1. If dim U = n and dim V = m, then dim Hom(U, V) = mn. 
proof. Let {a x , . . . , a„} be a basis of U and let {&, . . . , fi n } be a basis of 
V. Define the linear transformation of a tj by the rule 

°wO*) = <*«& 

TO 

= ZWr (n.i) 



r=l 



Thus a it is represented by the matrix [d ri d ik ] = A u . A it has a zero in every 
position except for a 1 in row / column j. 

The set {a u } is linearly independent. For if a linear relation existed among 
the o u it would be of the form 

1 au^ii = 0. 

This means 2. .fl 4 ,(T, y (a fc ) = for alia*. But J., ^(T,,(a ft ) = 2 w a w <5 y *ft = 
Zi a ikPi = 0- Since {&} is a lineary independent set, a ik = for / = 1 , 
2, . . . , m. Since this is true for each k, all a u = and {a it } is linearly 
independent. 

If or e Hom(U, V) and cr(a fc ) = ^tt a*Pi, 
then 

m / n \ 

m n 

= 2 2>«o«(a*) 

i=l 3=1 J 

Thus {or„} spans Hom(U, V), which is therefore of dimension mn. a 

If V x is a subspace of V, every linear transformation of U into V x also defines 
a mapping of U into V. This mapping of U into V is a linear transformation of 



84 Linear Transformations and Matrices | II 

U into V. Thus, with each element of Hom(U, Vj) there is associated in a 
natural way an element of Hom(U, V). We can identify Hom(U, V x ) with 
asubsetofHom(U, V). With this identification Horn (U, V x ) is a subspace of 
Hom(U, V). 

Now let U 1 be a subspace of U. In this case we cannot consider Hom(U x , V) 
to be a subset of Hom(U, V) since a linear transformation in Hom(L/ 1 , V) 
is not necessarily defined on all of U. But any linear transformation in 
Hom(U, V) is certainly defined on U x . If a e Hom(U, V) we shall consider 
the mapping obtained by applying a only to elements in U x to be a new function 
and denote it by R(a). R(a) is called the restriction of a to U x . We can con- 
sider R(a) to be an element of Hom(L/ x , V). 

It may happen that different linear transformations defined on U produce 
the same restriction on U x . We say that o x and a 2 are equivalent on L/ x if and 
only if R(a x ) = R(a 2 ). It is clear that R(a + r) = i*(<r) + i?(r) and R(aa) = 
aR(a) so that the mapping of Hom(U, V) into Kom(U x , V) is linear. We call 
this mapping R, the restriction mapping. 

The kernel of R is clearly the set of all linear transformations in Hom(U, V) 
that vanish on U x . Let us denote this kernel by U*. 

If a is any linear transformation belonging to Hom(U x , V), it can be 
extended to a linear transformation belonging to Hom(U, V) in many ways. 
If {a x , . . . , <xj is a basis of U such that {a l5 . . . , a r } is a basis of U x , then let 
(7(a,) = <r(a 3 -) for j = 1 , . . . , r, and let crfo) be defined arbitrarily for 
j = r + 1 , . . . , n. Since a is then the restriction of a, we see that R is an 
epimorphism of Hom(U, V) onto Hom^, V). Since Hom(U, V) is of dimen- 
sion mn and Hom(U x , V) is of dimension mr, U* is of dimension m(n — r). 

Theorem 11.2. Hom(L/ x , V) is canonically isomorphic to Hom(U, V)/U*. □ 

Note: It helps the intuitive understanding of this theorem to examine the 
method by which we obtained an extension off on U x , to a on U. U* is the 
set of all extensions of a when a is the zero mapping, and one can see directly 
that the dimension of U* is (n — r)m. 



chapter 



III 



Determinants, 
eigenvalues, 
and similarity 
transforma- 
tions 



This chapter is devoted to the study of matrices representing linear trans- 
formations of a vector space into itself. We have seen that if A represents 
a linear transformation a of V into itself with respect to a basis A, and P 
is the matrix of transition from A to a new basis A', then P~ X AP = A' is 
the matrix representing a with respect to A'. In this case A and A' are said 
to be similar and the mapping of A onto A' = P~ X AP is called a similarity 
transformation (on the set of matrices, not on V). 

Given a, we seek a basis for which the matrix representing a is particularly 
simple. In practice a is given only implicitly by giving a matrix A representing 
a. The problem, then, is to determine the matrix of transition P so that 
P~ X AP has the desired form. The matrix representing a has its simplest 
form whenever a maps each basis vector onto a multiple of itself; that is, 
whenever for each basis vector a there exists a scalar A such that <r(a) = Aa. 
It is not always possible to find such a basis, but there are some rather general 
conditions under which it is possible. These conditions include most cases 
of interest in the applications of this theory to physical problems. 

The problem of finding non-zero a such that cr(oc) = Aa is equivalent to 
the problem of finding non-zero vectors in the kernel of a — A. This is a 
linear problem and we have given practical methods for solving it. But 
there is no non-zero solution to this problem unless a — A is singular. Thus 
we are faced with the problem of finding those A for which a — X is singular. 
The values of X for which a — X is singular are called the eigenvalues of a, 
and the non-zero vectors a for which <r(a) = Aa are called eigenvectors of a. 

We introduce some topics from the theory of determinants solely for 
the purpose of finding the eigenvalues of a linear transformation. Were 
it not for this use of determinants we would not discuss them in this book. 
Thus, the treatment given them here is very brief. 

85 



86 Determinants, Eigenvalues, and Similarity Transformations | III 

Whenever a basis of eigenvectors exists, the use of determinants will 
provide a method for finding the eigenvalues and, knowing the eigenvalues, 
use of the Hermite normal form will enable us to find the eigenvectors. 
This method is convenient only for vector spaces of relatively small di- 
mension. For numerical work with large matrices other methods are 
required. 

The chapter closes with a discussion of what can be done if a basis of 
eigenvectors does not exist. 



1 I Permutations 

To define determinants and handle them we have to know something 
about permutations. Accordingly, we introduce permutations in a form 
most suitable for our purposes and develop their elementary properties. 

A permutation n of a set S is a one-to-one mapping of S onto itself. We 
are dealing with permutations of finite sets and we take S to be the set of 
the first n integers; S == {1, 2, . . . , n). Let n(i) denote the element which 
n associates with /. Whenever we wish to specify a particular permutation 
we describe it by writing the elements of S in two rows ; the first row con- 
taining the elements of S in any order and the second row containing the 
element n(i) directly below the element i in the first row. Thus for S = 
{1, 2, 3, 4}, the permutation n for which n{\) = 2, n(2) = 4, tt-(3) = 3, 
and 7r(4) = 1 , can conveniently be described by the notations 

/I 2 3 4, Qr /2 4 1 3V ^ ,4.32 

\2 4 3 1/ \4 1 2 3/ \l 2 3 4 

Two permutations acting on the same set of elements can be combined 
as functions. Thus, if n and a are two permutations, an will denote that 
permutation mapping i onto a[n(i)]; (an)(i) = a[n(i)]. As an example, 
let n denote the permutation described above and let 



a = 
Then 

OTT = 



12 3 4 

13 4 2 

12 3 4 

3 2 4 1 



Notice particularly that an ^ na. 

If n and a are two given permutations, there is a unique permutation 
p such that pn = a. Since p must satisfy the condition that p[n(i)] = a(i), 
p can be described in our notation by writing the elements n(i) in the first 



1 I Permutations 87 

row and the elements a(i) in the second row. For the n and a described 
above, 

/2 4 3 1' 



13 4 2/. 

The permutation that leaves all elements of S fixed is called the identity 
permutation and will be denoted by e. For a given tt the unique permutation 
7T _1 such that tt~ x tt = e is called the inverse of it. 

If for a pair of elements i <y in S we have tt{i) > -n-(j), we say that tt 
performs an inversion. Let k(Tr) denote the total number of inversions 
performed by tt; we then say that tt contains k(Tr) inversions. For the 
permutation tt described above, k(Tr) = 4. The number of inversions in 
7T -1 is equal to the number of inversions in tt. 

For a permutation tt, let sgn tt denote the number (— l) kM . "Sgn" is 
an abbreviation for "signum" and we use the term "sgn 77-" to mean "the 
sign of 77." If sgn tt — 1 , we say that tt is even ; if sgn tt = — 1 , we say that 
tt is odd. 

Theorem 1.1. Sgn air = sgn a • sgn tt. 
proof, a can be represented in the form 

77(0 • • • TT(j) ■ 



<y = 

OTr(i) • • • Ott(J) 

because every element of S appears in the top row. Thus, in counting the 
inversions in a it is sufficient to compare 77(1) and tt(j) with OTr(i) and on(j). 
For a given i < j there are four possibilities : 

1. i <j; tt{i) < 7r(j); ottQ) < o-tt(j): no inversions. 

2. i <j; Tr(i) < tt(j); aTr(i) > ott(j): one inversion in a, one in 0-77. 

3. i <y; 7r(/) > 7r(y); cnr(i) > ott{j)\ one inversion in 77, one in <77r. 

4. 1 <y; 7r(/) > 7r(/); (T7r(i) < gtt(j): one inversion in 77-, one in a, and 

none in cm. 

Examination of the above table shows that k{oTr) differs from k(a) + k(ir) 
by an even number. Thus sgn an = sgn a • sgn tt. □ 

Theorem 1.2. If a permutation tt leaves an element of S fixed, the inversions 
involving that element need not be considered in determining whether tt is even 
or odd. 

proof. Suppose 7r(j) = j. There are j — 1 elements of S less than j and 
n — j elements of S larger than j. For i < j an inversion occurs if and only 
if 7r(0 > tt(j) — j. Let k be the number of elements i in S preceding j for 
which 7t(/) > j. Then there must also be exactly k elements i of S following 
j for which tt(J) < j. It follows that there are 2k inversions involving j. 
Since their number is even they may be ignored in determining sgn tt. □ 



88 Determinants, Eigenvalues, and Similarity Transformations | III 

Theorem 1.3. A permutation which interchanges exactly two elements of 
S and leaves all other elements of S fixed is an odd permutation. 

proof. Let 77 be a permutation which interchanges the elements / andy 
and leaves all other elements of S fixed. According to Theorem 1.2, in 
determining sgn tt we can ignore the inversions involving all elements of 
S other than i and/ There is just one inversion left to consider and sgn tt = 
-1. □ 

Among other things, this shows that there is at least one odd permutation. 
In addition, there is at least one even permutation. From this it is but a 
step to show that the number of odd permutations is equal to the number of 
even permutations. 

Let a be a fixed odd permutation. If n is an even permutation, an is odd. 
Furthermore, cr _1 is also odd so that to each odd permutation r there cor- 
responds an even permutation o~ x t. Since a~ 1 {aTr) = tt, the mapping of 
the set of even permutations into the set of odd permutations defined by 
tt -> ott is one-to-one and onto. Thus the number of odd permutations is 
equal to the number of even permutations. 

EXERCISES 

1. Show that there are n\ permutations of n objects. 

2. There are six permutations of three objects. Determine which of them are 
even and which are odd. 

3. There are 24 permutations of four objects. By use of Theorem 1.2 and 
Exercise 2 we can determine the parity (evenness or oddness) of 15 of these permu- 
tations without counting inversions. Determine the parity of these 1 5 permutations 
by this method and the parity of the remaining nine by any other method. 

4. The nine permutations of four objects that leave no object fixed can be 
divided into two types of permutations, those that interchange two pairs of objects 
and those that permut the four objects in some cyclic order. There are three 
permutations of the first type and six of the second. Find them. Knowing the 
parity of the 15 permutations that leave at least one object fixed, as in Exercise 3, 
and that exactly half of the 24 permutations must be even, determine the parity 
of these nine. 

5. By counting the inversions determine the parity of 

'12 3 4 5 
2 4 5 13 

Notice that tt permutes the objects in {1, 2, 4} among themselves and the objects 
in {3, 5} among themselves. Determine the parity of it on each of these subsets 
separately and deduce the parity of tt on all of S. 



2 I Determinants 



89 



2 I Determinants 

Let A = [a it ] be a square n x n matrix. We wish to associate with this 
matrix a scalar that will in some sense measure the "size" of A and tell us 
whether or not A is non-singular. 

Definition. The determinant of the matrix A = [a ti ] is defined to be the 
scalar det A = \a it \ computed according to the rule 



det A = \a ti \ = 2 ( s 8 n *) a um a **w ' " a 



mi(n)> 



(2.1) 



where the sum is taken over all permutations of the elements of S = {1 , . . . , «}. 
Each term of the sum is a product of n elements, each taken from a different 
row of A and from a different column of A , and sgn n. The number n is called 
the order of the determinant. 



As a direct application of this definition we see that 



a lx a 12 



*2X 



= flnfloo — fliofl 



12 u 21- 



fl u fl la tfj 



'21 



«3i fl a 



fl 9 



= «U a 22 a 33 + ^12^23^31 + fll3«2i a 32 — «12«21^33 
— tf 13 <Z 2 2 fl 31 — Onfl23^32- 



(2.2) 



(2.3) 



In general, a determinant of order n will be the sum of n\ products. As 
n increases, the amount of computation increases astronomically. Thus it 
is very desirable to develop more efficient ways of handling determinants. 

Theorem 2.1. det A T = det A. 

proof. In the expansion of det A each term is of the form 

(Sgn 7r)a lff(1 )a 2ir (2) " ' ' a nrrM- 

The factors of this term are ordered so that the indices of the rows appear 
in the usual order and the column indices appear in a permuted order. In the 
expansion of det A T the same factors will appear but they will be ordered 
according to the row indices of A T , that is, according to the column indices 
of A. Thus this same product will appear in the form 

(Sgn 7T -1 )a ff -i(i) i ia a -i(2),2 ' ' ' «T _1 (n),n- 

But since sgn tt-- 1 = sgn tt, this term is identical to the one given above. 
Thus, in fact, all the terms in the expansion of det A T are equal to cor- 
responding terms in the expansion of det A, and det A T = det A. □ 



90 Determinants, Eigenvalues, and Similarity Transformations | III 

A consequence of this discussion is that any property of determinants 
developed in terms of the rows (or columns) of A will also imply a cor- 
responding property in terms of the columns (or rows) of A. 

Theorem 2.2. If A' is the matrix obtained from A by multiplying a row 
(or column) of A by a scalar c, then det A' = c det A. 

proof. Each term of the expansion of det A contains just one element 
from each row of A. Thus multiplying a row of A by c introduces the factor c 
into each term of det A. Thus det A' = c det A. • □ 

Theorem 2.3. If A' is the matrix obtained from A by interchanging any 
two rows (or columns) of A, then det A' = — det A. 

proof. Interchanging two rows of A has the effect of interchanging two 
row indices of the elements appearing in A. If or is the permutation inter- 
changing these two indices, this operation has the effect of replacing each 
permutation tt by the permutation tra. Since a is an odd permutation, this 
has the effect of changing the sign of every term in the expansion of det A. 
Therefore, det A' = —det A. □ 

Theorem 2.4. If A has two equal rows, det A = 0. 

proof. The matrix obtained from A by interchanging the two equal 
rows is identical to A, and yet, by Theorem 2.3, this operation must change 
the sign of the determinant. Since the only number equal to its negative is 
det A = 0. □ 

Note: There is a minor point to be made here. If 1 + 1 = 0, the proof 
of this theorem is not valid, but the theorem is still true. To see this we 
return our attention to the definition of a determinant. Sgn tt == 1 for both 
even and odd permutations. Then the terms in (2.1) can be grouped into 
pairs of equal terms. Since the sum of each pair is 0, the determinant is 0. 

Theorem 2.5. If A' is the matrix obtained from A by adding a multiple of 
one row (or column) to another, then det A' = det A. 

proof. Let A' be the matrix obtained from A by adding c times row k 
to rowy. Then 

det A' = ^ (sgn 7r)a lw(1) • ■ • (a jwU) + ca kirU) ) • • • a Mk) • • • a nw(n) 

IT 

= 2 ( s § n ^flird) ' ' ' a trli) • • • a k7r{k) • • • a nn(n) 

IT 

+ c 2 (sgn 7T)a l1Ta) • • • a k]rU) • • • a kjrik) • • • a nw{n) . (2.4) 

ir 

The second sum on the right side of this equation is, in effect, the deter- 
minant of a matrix in which rows j and k are equal. Thus it is zero. The 
first term is just the expansion of det A. Therefore, det A' = det A. u 

It is evident from the definition that, if / is the identity matrix, det / = 1. 



2 I Determinants 



91 



If E is an elementary matrix of type I, det E = c where c is the scalar 
factor employed in the corresponding elementary operation. This follows 
from Theorem 2.2 applied to the identity matrix. 

If E is an elementary matrix of type II, det E = 1. This follows from 
Theorem 2.5 applied to the identity matrix. 

If E is an elementary matrix of type III, det E = — 1. This follows from 
Theorem 2.3 applied to the identity matrix. 

Theorem 2.6. If E is an elementary matrix and A is any matrix, then 
det EA = det E • det A = det AE. 

proof. This is an immediate consequence of Theorems 2.2, 2.5, 2.3, and 
the values of the determinants of the corresponding elementary matrices. □ 

Theorem 2.7. det A = if and only if A is singular. 

proof. If A is non-singular, it is a product of elementary matrices (see 
Chapter II, Theorem 6.1). Repeated application of Theorem 2.6 shows 
that det A is equal to the product of the determinants of the corresponding 
elementary matrices, and hence is non-zero. 

If A is singular, the rows are linearly dependent and one row is a linear 
combination of the others. By repeated application of elementary operations 
of type II we can obtain a matrix with a row of zeros. The determinant of 
this matrix is zero, and by Theorem 2.5 so also is det A. U 

Theorem 2.8. If A and B are any two matrices of order n, then det AB = 
det A • det B = det BA. 

proof. If A and B are non-singular, the theorem follows by repeated 
application of Theorem 2.6. If either matrix is singular, then AB and BA 
are also singular and all terms are zero. □ 



EXERCISES 

1. If all elements of a matrix below the main diagonal are zero, the matrix is 
said to be in superdiagonal form; that is, a^ = for / > j. If A = [a^] is in super- 
diagonal form, compute det A. 

2. Theorem 2.6 "provides an effective and convenient way to evaluate deter- 
minants. Verify the following sequence of steps. 



3 


2 


2 




1 4 




1 






1 4 


1 


1 


4 


1 


= - 


3 2 


2 


= - 


-10 -1 


-2 


-4 


-1 




-2 -4 


-1 




4 1 








1 4 


1 




1 4 1 








= - 


-2 


1 


= - 


-2 1 












4 


1 









3 





Now use the results of Exercise 1 to evaluate the last determinant. 



92 



Determinants, Eigenvalues, and Similarity Transformations | III 



3. Actually, to compute a determinant there is no need to obtain a superdiagonal 
form. And elementary column operations can be used as well as elementary row 
operations. Any sequence of steps that will result in a form with a large number of 
zero elements will be helpful. Verify the following sequence of steps. 



3 


2 


2 




3 2 2 




3 2 


1 


4 


1 


= 


1 4 1 


= 


1 3 1 


-2 


-4 


-1 




-10 




-10 



This last determinant can be evaluated by direct use of the definition by computing 
just one product. Evaluate this determinant. 

4. Evaluate the determinants : 



(a) 



1 -2 
-1 3 

2 5 



ib) 



1 


2 





1 


1 


3 


4 








1 


5 


6 


1 


2 


3 


4 



5. Consider the real plane R 2 . We agree that the two points (a t , a 2 ), (b lt b 2 ) 
suffice to describe a quadrilateral with corners at (0, 0), {a x , a 2 ), (b x , b 2 ), and 
(a x + b x ,a 2 + b 2 ). (See Fig. 2.) Show that the area of this quadrilateral is 



b \ b 2 



(oi + bi, U2 + b 2 ) 




Fie. 2 



3 | Cofactors 



93 



Notice that the determinant can be positive or negative, and that it changes sign 
if the first and second rows are interchanged. To interpret the value of the deter- 
minant as an area, we must either use the absolute value of the determinant or give 
an interpretation to a negative area. We make the latter choice since to take the 
absolute value is to discard information. Referring to Fig. 2, we see that the 
direction of rotation from (a lf a 2 ) to {b lt b 2 ) across the enclosed area is the same as 
the direction of rotation from the positive a^-axis to the positive # 2 -axis. To 
interchange {a lt a 2 ) and (6 l9 b 2 ) would be to change the sense of rotation and the 
sign of the determinant. Thus the sign of the determinant determines an orientation 
of the quadrilateral on the coordinate system. Check the sign of the determinant 
for choices of (a lf a 2 ) and (Jb ± , b 2 ) in various quadrants and various orientations. 

6. (Continuation) Let E be an elementary transformation of R 2 onto itself. 
E maps the vertices of the given quadrilateral onto the vertices of another quad- 
rilateral. Show that the area of the new quadrilateral is det E times the area of the 
old quadrilateral. 

7. Let x lt . . . , x n be a set of indeterminates. The determinant 



1 x x CBj 2 



«n-l 



X 2 



1 x n x n 

is called the Vandermonde determinant of order n. 

(a) Show that Fis a polynomial of degree n — 1 in each indeterminate separately 
and of degree n{n — l)/2 in all the indeterminates together. 

(b) Show that, for each i <j, Kis divisible by x 6 — x t . 

(c) Show that TJ (*,- — x t ) is a polynomial of degree n — 1 in each in- 

l<i<3<n 

determinate separately, and of degree n(n — l)/2 in all the indeterminates together. 

(d) Show that V = JJ (x, - x<). 

l<i<j<n 



3 I Cofactors 

For a given pair i, j, consider in the expansion for det A those terms which 
have a H as a factor. Det A is of the form det A = a^A^ + (terms which do 
not contain a u as a factor). The scalar A u is called the cofactor of a {j . 



In particular, we see that A 1X = ^ (sgn -n)a 



2tt(2) 



nn(n) 



where this 

sum includes all permutations -n that leave 1 fixed. Each such n defines a 
permutation 7/ on S' = {2, . . . , n} which coincides with tt on S. Since 
no inversion of tt involves the element 1, we see that sgn tt = sgn tt'. Thus 
A i} is a determinant, the determinant of the matrix obtained from A by 
crossing out the first row and the first column of A. 



94 Determinants, Eigenvalues, and Similarity Transformations | III 

A similar procedure can be used to compute the cofactors A H . By a 
sequence of elementary row and column operations of type III we can 
obtain a matrix in which the element a {j is moved into row 1, column 1. 
By applying the observation of the previous paragraph we see that the 
cofactor A u is essentially the determinant of the matrix obtained by crossing 
out the row and column containing the element a tj . Furthermore, we can 
keep the other rows and columns in the same relative order if the sequence 
of operations we use interchanges only adjacent rows or columns. It takes 
i — 1 interchanges to move the element a ti into the first row, and it takes 
j — 1 interchanges to move it into the first column. Thus A if is (— l) l '-i+>'-i = 
(_ iy+3 times the determinant of the matrix obtained by crossing out the 
rth row and theyth column of A. 

Each term in the expansion of det A contains exactly one factor from each 
row and each column of A. Thus, for any given row of A each term of det A 
contains exactly one factor from that row. Hence, for any given i, 

det A = 2 a^A^. (3.1) 

i 

Similarly, for any given column of A each term of det A contains exactly one 
factor from that column. Hence, for any given k, 

det A = 2 a ik A ih . (3.2) 

3 

These expansions of a determinant according to the cofactors of a row 
or column reduce the problem of computing an «th order determinant to 
that of computing n determinants of order n — 1. We have already given 
explicit expansions for determinants of orders 2 and 3, and the technique 
of expansions according to cofactors enables us to compute determinants 
of higher orders. The labor of evaluating a determinant of even quite 
modest order is still quite formidable, however, and we make some suggestions 
as to how the work can be minimized. 

First, observe that if any row or column has several zeros in it, expansion 
according to cofactors of that row or column will require the evaluation of 
only those cofactors corresponding to non-zero elements. It is clear that 
the presence of several zeros in any row or column would considerably 
reduce the labor. If we are not fortunate enough to find such a row or 
column, we can produce a row or column with a large number of zeros by 
applying some elementary operations of type II. For example, consider 
the determinant 



det ,4 = 



3 2-2 10 

3 112 

-2234 

115 2 



3 I Cofactors 



95 



If the numbers appearing in the array were unwieldy, there would be no 
choice but to wade in and make the best of it. The numbers in our example 
are all integers, and we will not introduce fractions if we take advantage 
of the l's that appear in the array. By Theorem 2.5, a sequence of elementary 
operations of type II will not change the value of the determinant. Thus we 
can obtain 

-1 -17 4 



det^ = 



-2 -14 -4 
4 13 8 



1 



1 



1 -17 



-2 



4 

14 -4 
13 8 



Now we face several options. We can expand the 3rd order determinants as 
it stands; we can try the same technique again; or we can try to remove a 
common factor from some row or column. We can remove the common 
factor —1 from the second row and the common factor 4 from the third 
column. Although 2 is factor of the second row, we cannot remove both 
a 2 from the second row and a 4 from the third column. Thus we can obtain 



det A = 4 



-1 -17 



14 
13 





-1 


-17 


1 


= 4- 


3 


31 







6 


47 







3 


31 






= 4- 


6 


47 


— 


-1 



If we multiply the elements in row i by the cofactors of the elements in 
row k 7^ i, we get the same result as we would if the elements in row k were 
equal to the elements in row /. Hence, 



and 



2 a it A M = for i ^ k, 



2 a^A^ = for j ^ k. 



(3.3) 



(3.4) 



The various relations we have developed between the elements of a matrix 
and their cofactors can be summarized in the form 



2 «.• A* = <5<* det A, 



(3.5) 



2 a a A iic = djh det A. 



(3.6) 



If A = [a {j ] is any square matrix and A tj is the cofactor of a ijy the matrix 
[A tj ] T = adj A is called the adjunct of A. What we here call the "adjunct" 



96 



Determinants, Eigenvalues, and Similarity Transformations | III 



is traditionallly called the "adjoint." Unfortunately, the term "adjoint" 
is also used to denote a linear transformation that is not represented by the 
adjoint (or adjunct) matrix. A new term is badly needed. We shall have a 
use for the adjunct matrix only in this chapter. Thus, this unconventional 
terminology will cause only a minor inconvenience and help to avoid con- 
fusion. 



Theorem 3.1. 

PROOF. 



A-ad]A = (adj A) • A = (det A) • /. 



A • adj A = [a it ] • [A kl f = 



2 a a A u 



= (det A) • I. 



(3.7) 



(adj A)-A = [A kl f 



[««] = 



I * 



ik u ii 



= (det A) -I. a 



(3.8) 



Theorem 3.1 provides us with an effective technique for computing the 
inverse of a non-singular matrix. However, it is effective only in the sense 
that the inverse can be computed by a prescribed sequence of steps. The 
number of steps is large for matrices of large order, and it is not sufficiently 
small for matrices of low order to make it a preferred technique. The method 
described in Section 6 of Chapter II is the best method that is developed in 
this text. In numerical analysis where matrices of large order are inverted, 
highly specialized methods are available. But a discussion of such methods 
is beyond the scope of this book. 

A matrix A is non-singular if and only if det A ^ 0, and in this case we 
can see from the theorem that 







s% = 


— au s\. 

det A 










This is illustrated in 


the following example. 










" 1 


2 3" 




"-3 


5 


1 


A = 


2 


1 2 


adj A = 


-2 


5 


4 




_-2 


1 -1_ 


* 


4 


-5 


-3 






"-3 5 


r 










A~* = i 


-2 5 


4 














4 -5 - 


3 









(3.9) 



The relations between the elements of a matrix and their cofactors lead 
to a method for solving a system of n simultaneous equations in n unknowns 



3 I Cofactors 



97 



when the equations are independent. Suppose we are given the system of 
equations 

J.a i§ x s =b it (i = l,2,...,n). (3.10) 



3=1 



The assumption that the equations are independent is expressed in the 
condition that det A ^ 0, where A = [a it ]. Let A H be the cofactor of a^. 
Then for a given k 



n / n \ n i n \ 

2 A ik { 2,a tJ xA = 22 A ik a i Ax i 

n 

= 2 det 4 w a;, 



Since det A y£ we see that 



= det A z fc = 2 A ncK 



3/i. — 



.2, ^ifc^i 
i=l 

det A 



(3.11) 



(3.12) 



The numerator can be interpreted as the cofactor expansion of the deter- 
minant of the matrix obtained by replacing the kth column of A by the 
column of the b t . In this form the method is known as Cramer's rule. 

Cramer's rule is convenient for systems of equations of low order, but 
it fails if the system of equations is dependent or the number of equations 
is different from the number of unknowns. Even in these cases Cramer's 
rule can be modified to provide solutions. However, the methods we have 
already developed are usually easier to apply, and the balance in their favor 
increases as the order of the system of equations goes up and the nullity 
increases. 



EXERCISES 

1. In the determinant 

2 7 5 8 

7-125 

10 4 2 

-3 6-1 2 

find the cofactor of the "8" ; find the cofactor of the " -3." 

2. The expansion of a determinant in terms of a row or column, as in formulas 
(3.1) and (3.2), provides a convenient method for evaluating determinants. The 



98 



Determinants, Eigenvalues, and Similarity Transformations | III 



amount of work involved can be reduced if a row or column is chosen in which 
some of the elements are zeros. Expand the determinant 



1 


3 


4 


-1 


2 


2 





1 





-1 


1 


3 


3 





1 


2 



in terms of the cofactors of the third row. 

3. It is even more convenient to combine an expansion in terms of cofactors 
with the method of elementary row and column operations described in Section 2. 
Subtract appropriate multiples of column 2 from the other columns to obtain 



1 


3 


7 


8 


2 


2 


2 


7 





-1 








-3 





1 


2 



and expand this determinant in terms of cofactors of the third row. 

4. Show that det (adj A) = (det A)"- 1 . 

5. Show that a matrix is non-singular if and only if its adj A is also non-singular. 

6. Let A = [ciij] be an arbitrary n x n matrix and let adj A be the adjunct of A. 
If X = (x lt . . . , x n ) and Y = (y lt . . . , y n ) show that 



y y (adj A)X = - 



For notation see pages 42 and 55. 



V\ 



4 I The Hamilton-Cayley Theorem 

Let p(x) = a m x m + • • • + a be a polynomial in an indeterminate x 
with scalar coefficients a^ If A is an n x n matrix, by p(A) we mean the 

matrix a m A m + a m _ x A m ~ x H + a I. Notice particularly that the 

constant term a must be replaced by a I so that each term of p(A) will be 
a matrix. No particular problem is encountered with matric polynomials of 
this form since all powers of a single matrix commute with each other. 
Any polynomial identity will remain valid if the indeterminate is replaced 



4 | The Hamilton-Cayley Theorem 



99 



by a matrix, provided any scalar terms are replaced by corresponding scalar 
multiples of the identity matrix. 

We may also consider polynomials with matric coefficients. To make 
sense, all coefficients must be matrices of the same order. We consider 
only the possibility of substituting scalars for the indeterminate, and in all 
manipulations with such polynomials the matric coefficients commute with 
the powers of the indeterminate. Polynomials with matric coefficients can 
be added and multiplied in the usual way, but the order of the factors 
is important in multiplication since the coefficients may not commute. The 
algebra of polynomials of this type is not simple, but we need no more than 
the observation that two polynomials with matric coefficients are equal if 
and only if they have exactly the same coefficients. 

We avoid discussing the complications that can occur for polynomials 
with matric coefficients in a matric variable. 

Now we should like to consider matrices for which the elements are 
polynomials. If F is the field of scalars for the set of polynomials in the 
indeterminate x, let K be the set of all rational functions in x; that is, the 
set of all permissible quotients of polynomials in x. It is not difficult to show 
that K is a field. Thus a matrix with polynomial components is a special 
case of a matrix with elements in K. 

From this point of view a polynomial with matric coefficients can be 
expressed as a single matrix with polynomial components. For example, 

x 2 + 2 2x - 1 

-x 2 - 2x + 1 2x 2 + 1. 

Conversely, a matrix in which the elements are polynomials in an indeter- 
minate x can be expanded into a polynomial with matric coefficients. Since 
polynomials with matric coefficients and matrices with polynomial compo- 
nents can be converted into one another, we refer to both types of expressions 
as polynomial matrices. 

Definition. If A is any square matrix, the polynomial matrix A — xl — 
C is called the characteristic matrix of A . 
C has the form 



r i °i 




r o 2i 




[2 


-11 




-1 2, 


x 2 + 


-2 0_ 


x + 


1 


1_ 


— 



flu — X 



floo X 



[_ "nl 



'n2 



«2„ 



(4.1) 



a„„ — x 



100 Determinants, Eigenvalues, and Similarity Transformations | III 

The determinant of C is a polynomial det C = f(x) = k n x n + k n _ x x n - x + 
- • • + k of degree n; it is called the characteristic polynomial of A. The 
equation /(x) = is called the characteristic equation of A. First, we should 
observe that the coefficient of x n in the characteristic polynomial is (— l) n , 
the coefficient of x n ~ x is (— l) n_1 £ 4 n =1 a u> and the constant term k = det A. 

Theorem 4.1. {Hamilton-Cayley theorem). If A is a square matrix and 
f{x) is its characteristic polynomial, then j (A) = 0. 

proof. Since C is of order n, adj C will contain polynomials in x of degree 
not higher than n — 1. Hence adj C can be expanded into a polynomial 
with matric coefficients of degree at most n — 1 : 

adj C = C n _ x x n ~ x + C„_ 2 x"- 2 + • • • + C x x + C (4.2) 

where each C t is a matrix with scalar elements. 
By Theorem 3.1 we have 

adjC- C = det C- /=/(>;)/ 

= adj C • {A - xl) = (adj C)A - (adj C)x. (4.3) 

Hence, 

k n Ix n + k n _Jx n - x -\ + kjx + k I 

+ CV^a;"- 1 + • • • + C^x + C Q A. (4.4) 

The expressions on the two sides of this equality are n X n polynomial 
matrices. Since two polynomial matrices are equal if and only if the cor- 
responding coefficients are equal, (4.4) is equivalent to the following set 
of matric equations: 

kn* = C n _ x 
k n -\l = C n -2 + C n _ x A 



(4.5) 



k I = C ^4. 



Multiply each of these equations by A n , A n ~ x , . . . , A, I from the right, 
respectively, and add them. The terms on the right side will cancel out 
leaving the zero matrix. The terms on the left add up to 

k n A n + k n _ x A n ~ x + -.. + h x A + k Q I=f(A) = 0. □ (4.6) 

The equation m(x) = of lowest degree which A satisfies is called the 
minimum equation (or minimal equation) for A ; m(x) is called the minimum 
polynomial for A. Since A satisfies its characteristic equation the degree 
of m(x) is not more than n. Since a linear transformation and any matrix 



4 | The Hamilton-Cayley Theorem 101 

representing it satisfy the same relations, similar matrices satisfy the same 
set of polynomial equations. In particular, similar matrices have the same 
minimum polynomials. 

Theorem 4.2. If g(x) is any polynomial with coefficients in F such that 
g(A) — 0, then g(x) is divisible by the minimum polynomial for A. The 
minimum polynomial is unique except for a possible non-zero scalar factor. 

proof. Upon dividing g(x) by m(x) we can write g{x) in the form 

g(x) = m{x) • q{x) + r(x), (4.7) 

where q(x) is the quotient polynomial and r{x) is the remainder, which is 
either identically zero or is a polynomial of degree less than the degree of 
m(x). Ifg(x) is a polynomial such that g(A) = 0, then 

g{A) = = m(A) • q(A) + r(A) = r{A). (4.8) 

This would contradict the selection of m(x) as the minimum polynomial 
for A unless the remainder r(x) is identically zero. Since two polynomials 
of the same lowest degree must divide each other, they must differ by a 
scalar factor. □ 

As we have pointed out, the elements of adj C are polynomials of degree at 
most n — 1. Let g(x) be the greatest common divisor of the elements of 
adj C. Since adj C ■ C =/(*)/, g(x) divides /(x). 

f( z ) 
Theorem 4.3. h(x) = —— is the minimum polynomial for A. 
g(x) 

proof. Let adj C = g(x)B where the elements of B have no non-scalar 

common factor. Since adj C • C = f{x)Iwe have h(x) • g(x)I = g(x)BC. Since 

g(x) j£ this yields 

BC = h(x)I. (4.9) 

Using B in place of adj C we can repeat the argument used in the proof of the 
Hamilton-Cayley theorem to deduce that h{A) = 0. Thus h(x) is divisible 
by m(x). 

On the other hand, consider the polynomial m(x) — m(y). Since it is a 
sum of terms of the form c^ — y*), each of which is divisible by y — x, 
m(x) — m(y) is divisible by y — x: 

m(x) - m(y) = (y - x) • k(x, y). (4.10) 

Replacing x by xl and y by A we have 

m(xl) - m(A) = m(x)I = (A - xl) • k(xl, A) = C ■ k(xl, A). (4.1 1) 

Multiplying by adj C we have 

m(x) adj C = (adj C)C • k(xl, A) = f(x) • k(xl, A). (4.12) 



102 Determinants, Eigenvalues, and Similarity Transformations | III 

Hence, 

m(x) ■ g{x)B = h{x) ■ g(x) • k(xl, A), (4. 13) 

m(x)B = h(x) • k(xl, A). (4.14) 



or 



Since h{x) divides every element of m{x)B and the elements of B have no 
non-scalar common factor, h(x) divides m(x). Thus, h{x) and m{x) differ 
at most by a scalar factor. □ 

Theorem 4.4. Each irreducible factor of the characteristic polynomial f(x) 
of A is also an irreducible factor of the minimum polynomial m(x). 
proof. As we have seen in the proof of the previous theorem 

m(x)I = C • k(xl, A). 
Thus 

det m(x)I = [m(x)] n = det C • det k(xl, A) 

= f(x)- det k(xl, A). (4.15) 

We see then that every irreducible factor off(x) divides [m(x)] n , and therefore 
m(x) itself. □ 

Theorem 4.4 shows that a characteristic polynomial without repeated 
factors is also the minimum polynomial. As we shall see, it is the case in 
which the characteristic polynomial has repeated factors that generally causes 
trouble. 

We now ask the converse question. Given the polynomial f(x) = 
{—\) n x n + k n _^ x x n ~ x + • • • + k , does there exist an n x n matrix A for 
which f(x) is the minimum polynomial ? 

Let A = {x lt . . . , <x n } be any basis and define the linear transformation 
a by the rules 

oiu-i) = a i+1 for i < n, (4.16) 

and 

(-l) n cr(a n ) = -k oc x - k x cc 2 A: n _ x a n . 

It follows directly from the definition of a that 

/(<0(«i) = (-l)MaJ + & n _ x a n + • • • + k t 0L 2 + k a x = 0. (4.17) 
For any other basis element we have 

/(*)(«*) =A«)[<y J - 1 M] = o^[f{a)M] = 0. (4.18) 

Since f(a) vanishes on the basis elements f(a) = and any matrix repre- 
senting a satisfies the equation f(x) = 0. 



4 | The Hamilton-Cayley Theorem 103 

On the other hand, a cannot satisfy an equation of lower degree because 
the corresponding polynomial in a applied to a x could be interpreted as 
a relation among the basis elements. Thus, f(x) is a minimum polynomial 
for a and for any matrix representing a. Since f(x) is of degree n, it must 
also be the characteristic polynomial of any matrix representing a. 

With respect to the basis A the matrix representing a is 



A = 





1 
1 



-(-1)% 
- (-l) n *i 



-(-l) n k 2 



(4.19) 



1 -(-1)^^ 

A is called the companion matrix off(x). 

Theorem 4.5. f(x) is a minimum polynomial for its companion matrix. □ 

EXERCISES 

1. Show that -x 3 + 39z - 90 is the characteristic polynomial for the matrix 

"0 -90" 

1 39 

1 

2. Find the characteristic polynomial for the matrix 

~2 -2 3" 

1 1 1 

1 3 -1 
and show by direct substitution that this matrix satisfies its characteristic equation. 

3. Find the minimum polynomial for the matrix 

^3 2 2" 

1 4 1 
-2 -4 -1 

4. Write down a matrix which has x 4 + 3* 3 + 2x 2 - x + 6 = as its minimum 
equation. 



104 Determinants, Eigenvalues, and Similarity Transformations | III 

5. Show that if the matrix A satisfies the equation x 2 + x + 1 =0, then A is 
non-singular and the inverse A -1 is expressible as a linear combination of A and /. 

6. Show that no real 3x3 matrix satisfies x 2 + 1 =0. Show that there are 
complex 3x3 matrices which do. Show that there are real 2x2 matrices that 
satisfy the equation. 

7. Find a 2 x 2 matrix with integral elements satisfying the equation x 3 — 1 =0, 
but not satisfying the equation x — 1 =0. 

8. Show that the characteristic polynomial of 

7 4-4" 

4 -8 -1 
-4 -1 -8 
is not its minimum polynomial. What is the minimum polynomial ? 

5 I Eigenvalues and Eigenvectors 

Let a be a linear transformation of V into itself. It is often useful to find 
subspaces of V in which a also acts as a linear transformation. If W is such 
a subspace, this means that cr(VV) <= W. A subspace with this property is 
called an invariant subspace of V under a. Generally, the problem of deter- 
mining the properties of a on V can be reduced to the problem of determining 
the properties of a on the invariant subspaces. 

The simplest and most restricted case occurs when an invariant subspace 
W is of dimension 1. In that case, let {a x } be a basis for W. Then, since 
o{v.i) e W, there is a scalar X x such that o{vl x ) = X x <x x . Also for any a e W, 
a = a x a x and hence c(a) = a x a(a x ) = a x X x y. x = X x a. In some sense the 
scalar X x is characteristic of the invariant subspace W; a stretches every 
vector in W by the factor X x . 

In general, a problem of finding those scalars X and associated vectors 
I for which o-(£) = A| is called an eigenvalue problem. A non-zero vector 
£ is called an eigenvector of a if there exists a scalar X such that c(£) = X£. 
A scalar X is called an eigenvalue of a if there exists a non-zero vector £ 
such that <r(£) = Xg. Notice that the equation <r(|) = X£ is an equation in 
two variables, one of which is a vector and the other a scalar. The solution 
1 = and X any scalar is a solution we choose to ignore since it will not 
lead to an invariant subspace of positive dimension. Without further 
conditions we have no assurance that the eigenvalue problem has any other 
solutions. 

A typical and very important eigenvalue problem occurs in the solution 
of partial differential equations of the form 

d 2 u d 2 u 
dx* dy 2 ~ ' 



5 | Eigenvalues and Eigenvectors 105 

subject to the boundary conditions that w(0, y) = u(tt, y) = 0, 
lim u(x, y) = 0, and u(x, 0) = f(x) 

y -►oo 

where f(x) is a given function. The standard technique of separation of 
variables leads us to try to construct a solution which is a sum of functions 
of the form XY where X is a function of x alone and Y is a function of 
y alone. For this type of function, the partial differential equation becomes 

dx 2 dy 2 

Since 

1 .^!l = _ A .<!*x 

Y dy 2 X dx 2 

is a function of x alone and also a function of y alone, it must be a constant 
(scalar) which we shall call k 2 . Thus we are trying to solve the equations 

— - —k 2 X — - k 2 Y 
dx 2 ~ kX > dy 2~ k Y - 

These are eigenvalue problems as we have defined the term. The vector 
space is the space of infinitely differentiable functions over the real numbers 
and the linear transformation is the differential operator d 2 /dx 2 . 
For a given value of k 2 (k > 0) the solutions would be 

X = a x cos kx + a 2 sin kx, 
Y = a 3 e~ kv + a 4 e*». 

The boundary conditions w(0, y) = and liny^ u(x, y) = imply that 
a i = a \ = 0. The most interesting condition for the purpose of this example 
is that the boundary condition u(tt, y) = implies that k is an integer. 
Thus, the eigenvalues of this eigenvalue problem are the integers, and the 
corresponding eigenfunctions (eigenvectors) are of the form a k e- kv sin kx. 
The fourth boundary condition leads to a problem in Fourier series; the 
problem of determining the a k so that the series 

oo 

2 a k sin kx 

represents the given function f(x) for < x < n. 

Although the vector space in this example is of infinite dimension, we 
restrict our attention to the eigenvalue problem in finite dimensional vector 
spaces. In a finite dimensional vector space there exists a simple necessary 
and sufficient condition which determines the eigenvalues of an eigenvalue 
problem. 

The eigenvalue equation can be written in the form (a — A)(£) = 0. 
We know that there exists a non-zero vector | satisfying this condition if 



106 Determinants, Eigenvalues, and Similarity Transformations | III 

and only if a — X is singular. Let A = {a l5 . . . , a n } be any basis of V and 
let A = [a i} ] be the matrix representing a with respect to this basis. Then 
A — XI = C(X) is the matrix representing a — X. Since A — XI is singular 
if and only if det {A — XI) = f(X) = 0, we see that we have proved 

Theorem 5.1. A scalar X is an eigenvalue of a if and only if it is a solution 
of the characteristic equation of a matrix representing a. □ 

Notice that Theorem 5.1 applies only to scalars. In particular a solution 
of the characteristic equation which is not a scalar is not an eigenvalue. For 
example, if the field of scalars is the field of real numbers, then non-real 
complex solutions of the characteristic equation are not eigenvalues. In the 
published literature on matrices the terms "proper values" and "characteristic 
values" are also used to denote what we have called eigenvalues. But, 
unfortunately, the same terms are often also applied to the solutions of the 
characteristic equation. We call the solutions of the characteristic equation 
characteristic values. Thus, a characteristic value is an eigenvalue if and only 
if it is also in the given field of scalars. This distinction between eigenvalues 
and characteristic values is not standard in the literature on matrices, but we 
hope this or some other means of distinguishing between these concepts will 
become conventional. 

In abstract algebra a field is said to be algebraically closed if every poly- 
nomial with coefficients in the field factors into linear factors in the field. 
The field of complex numbers is algebraically closed. Though many proofs 
of this assertion are known, none is elementary. It is easy to show that 
algebraically closed fields exist, but it is not easy to show that a specific field is 
algebraically closed. 

Since for most applications of concepts using eigenvalues or characteristic 
values the underlying field is either rational, real or complex, we content 
ourselves with the observation that the concepts, eigenvalue and characteristic, 
value, coincide if the underlying field is complex, and do not coincide if the 
underlying field is rational or real. 

The procedure for finding the eigenvalues and eigenvectors of a is fairly 
direct. For some basis A = {<x x , . . . , <x n }, let A be the matrix representing a. 
Determine the characteristic matrix C(x) = A — xl and the characteristic 
equation det {A — XI) = f(x) = 0. Solve the characteristic equation. (It is 
this step that presents the difficulties. The characteristic equation may have 
no solution in F. In that event the eigenvalue problem has no solution. 
Even in those cases where solutions exists, finding them can present practical 
difficulties.) For each solution X off(x) = 0, solve the system of homogene- 
ous equations 

(A - XI)X = C(X) -X=0. (5.1) 



5 | Eigenvalues and Eigenvectors 107 

Since this system of equations has positive nullity, non-zero solutions exist 
and we should use the Hermite normal form to find them. All solutions 
are the representations of eigenvectors corresponding to a. 

Generally, we are given the matrix A rather than a itself, and in this case 
we regard the problem as solved when the eigenvalues and the representations 
of the eigenvectors are obtained. We refer to the eigenvalues and eigenvectors 
of a as eigenvalues and eigenvectors, respectively, of A. 

Theorem 5.2. Similar matrices have the same eigenvalues and eigenvectors. 

proof. This follows directly from the definitions since the eigenvalues 

and eigenvectors are associated with the underlying linear transformation. □ 

Theorem 5.3. Similar matrices have the same characteristic polynomial. 
proof. Let A and A' = P~ X AP be similar. Then 

det (A' - xl) = det (P^AP - xl) = det {P~ X {A - xI)P) = detP- 1 

det {A - xl) det P = det (A - xl) =f(x). □ 

We call the characteristic polynomial of any matrix representing a the 
characteristic polynomial of o. Theorem 5.3 shows that the characteristic 
polynomial of a linear transformation is uniquely defined. 

Let S(X) be the set of all eigenvectors of a corresponding to X, together 
with 0. 

Theorem 5.4. S(A) is a subspace of V. 
proof. If a and (3 e S(X), then 

a(aa. + bp) = ao{v.) + ba{p) 
= aXcn + bXfi 
= X(a<x + bp). (5.2) 

Hence, ace + bp e S(A) and S(X) is a subspace. □ 

We call S(X) the eigenspace of a corresponding to X, and any subspace 
of S(X) is called an eigenspace of a. 

The dimension of S(X) is equal to the nullity of C(X), the characteristic 
matrix of A with X substituted for the indeterminate x. The dimension of 
S(X) is called the geometric multiplicity of X. We have shown that X is also 
a solution of the characteristic equation f(x) = 0. Hence, (x — A) is a 
factor of f(x). If (x — X) k is a factor of f(x) while (x — X) k+l is not, X is a 
root of f(x) = of multiplicity k. We refer to this multiplicity as the 
algebraic multiplicity of X. 

Theorem 5.5. The geometric multiplicity of X does not exceed the algebraic 
multiplicity of X. 

proof. Since the geometric multiplicity of X is defined independently of 
any matrix representing a and the characteristic equation is the same for all 



108 Determinants, Eigenvalues, and Similarity Transformations | III 

matrices representing a it will be sufficient to prove the theorem for any 
particular matrix representing a. We shall choose the matrix representing 
a so that the assertion of the theorem is evident. Let r be the dimension 
of S(K) and let {f 1} . . . , £ r } be a basis of S{X). This linearly independent set 
can be extended to a basis {£ 1? . . . , £J of V. Since o(£ t ) = A!< for / < r, 
the matrix A representing a with respect to this basis has the form 



A = 



'X 
I 












«l,r+l 

a 



*2,r+l 



1 a 



r,r+l 



a 



r+l,r+l 



a. 



(5.3) 



From the form of A it is evident that det {A — xl)=f(x) is divisible by 
(x _ X) r . Therefore, the algebraic multiplicity of I is at least r, which is the 
geometric multiplicity. □ 

Theorem 5.6. If the eigenvalues X x , . . . , X s are all different and {f ls ...,£,} 
is a set of eigenvectors, ^ corresponding to X t , then the set {£ l5 . . . , £,} is 
linearly independent. 

proof. Suppose the set is dependent and that we have reordered the 
eigenvectors so that the first k eigenvectors are linearly independent and 
the last s — k are dependent on them. Then 

k 
is = 2 a &i 

where the representation is unique. Not all a t = since | s ^ 0. Upon 
applying the linear transformation a we have 



There are two possibilities to be considered. If X s = 0, then none of the 
A; for / < k is zero since the eigenvalues are distinct. This would imply 



5 | Eigenvalues and Eigenvectors 109 

that {i u . . . , i k j is linearly dependent, contrary to assumption. If X s ^ 0, 
then 

k 1 

<-i A s 

Since not all a t = and XJX S ^ 1, this would contradict the uniqueness of 
the representation of £ s . Since we get a contradiction in any event, the 
set {£ x , ... , f J must be linearly independent. □ 

EXERCISES 

1. Show that X = is an eigenvalue of a matrix A if and only if A is singular. 

2. Show that if £ is an eigenvector of cr, then I is also an eigenvector of a n for 
each « > 0. If X is the eigenvalue of a corresponding to f , what is the eigenvalue 
of a n corresponding to £ ? 

3. Show that if £ is an eigenvector of both a and r, then £ is also an eigenvector 
of aa{ for aeF) and ct + t. If A x is the eigenvalue of a and A 2 is the eigenvalue of 
t corresponding to I, what are the eigenvalues of aa and a + T ? 

4. Show, by producing an example, that if X x and A 2 are eigenvalues of <r x and a 2 , 
respectively, it is not necessarily true that X 1 + A 2 is an eigenvalue of a t + <r 2 . 

5. Show that if £ is an eigenvector of a, then it is also an eigenvector of p(o) 
where p{x) is a polynomial with coefficients in F. If X is an eigenvalue of a corre- 
sponding to I, what is the eigenvalue of p{a) corresponding to f ? 

6. Show that if a is non-singular and X is an eigenvalue of a, then A" 1 is an 
eigenvalue of <r _1 . What is the corresponding eigenvector? 

7. Show that if every vector in V is an eigenvector of a, then a is a scalar trans- 
formation. 

8. Let P n be the vector space of polynomials of degree at most n — 1 , and let D 
be the differentiation operator; that is D(t k ) = kt*- 1 . Determine the characteristic 
polynomial for D. From your knowledge of the differentiation operator and net 
using Theorem 4.3, determine the minimum polynomial for D. What kind of 
differential equation would an eigenvector of D have to satisfy? What are the 
eigenvectors of D ? 

9. Let A = [a^]. Show that if ^ «« = c independent of /, then I = (1 , 1 , . . . , 
1) is an eigenvector. What is the corresponding eigenvalue? 

10. Let W be an invariant subspace of V under a, and let A = {a l5 . . . , a n } be a 
basis of V such that {a x , . . . , a] J is a basis of W. Let ^ = [a i3 ] be the matrix 
representing a with respect to the basis A. Show that all elements in the first k 
columns below the fcth row are zeros. 

11. Show that if X 1 and A 2 ^ X x are eigenvalues of a x and £ x and £ 2 are eigen- 
vectors corresponding to X x and X 2 , respectively, then f x + £ 2 is not an eigenvector. 

12. Assume that {$ lf . . . , | lr } are eigenvectors with distinct eigenvalues. Show 
that 2i =1 a t -£j is never an eigenvector unless precisely one coefficient is non-zero. 



110 



Determinants, Eigenvalues, and Similarity Transformations | HI 



13. Let A be an n x n matrix with eigenvalues A lt A 2 , . . . , A„. Show that if A 
is the diagonal matrix 

"li • • • ~~ 



A = 



A 2 








I 

and P = [pij] is the matrix in which column j is the M-tupIe representing an eigen- 
vector corresponding to A 3 -, then AP = PA. 

14. Use the notation of Exercise 13. Show that if A has n linearly independent 
eigenvalues, then eigenvectors can be chosen so that P is non-singular. In this case 
p-^AP = A. 



6 I Some Numerical Examples 

Since we are interested here mainly in the numerical procedures, we 
start with the matrices representing the linear transformations and obtain 
their eigenvalues and the representations of the eigenvectors. 
Example 1. Let 

"-I 2 2" 



A = 



2 2 2 
-3 -6 -6 



The first step is to obtain the characteristic matrix 

-_1 _ x 2 2 

C{x) = 2 2- x 2 

_3 _6 -6 - 

and then the characteristic polynomial 

detC(a:)= -(* + 2)(a + 3)a\ 

Thus the eigenvalues of A are A x = —2, A 2 = —3, and A 3 = 0. The next 
steps are to substitute, successively, the eigenvalues for x in the characteristic 
matrix. Thus we have 

1 2 2~ 



C(-2) = 



2 4 2 
-3 -6 -4 



6 | Some Numerical Examples jjj 

The Hermite normal form obtained from C(— 2) is 

1 2 

1 

_0 0_ 

The components of the eigenvector corresponding to X x = — 2 are found 
by solving the equations 

x x + 2x 2 =0 
x 3 = 0. 
Thus (2, —1,0) is the representation of an eigenvector corresponding to 
X x \ for simplicity we shall write ^ = (2, -1,0), identifying the vector 
with its representation. 
In a similar fashion we obtain 

"222 

C(-3)= 2 5 2 

__3 _6 -3 

From C(— 3) we obtain the Hermite normal form 

"1 1" 
1 

0. 
and hence the eigenvector £ 2 = (1,0, —1). 

'-1 2 2" 
C(0)= 2 2 2 



Similarly, from 



_3 _6 -6 
we obtain the eigenvector £ 3 = (0, 1, —1). 

By Theorem 5.6 the three eigenvectors obtained for the three different 
eigenvalues are linearly independent. 

Example 2. Let 



A = 



From the characteristic matrix 
C(x) = 



" 1 


i -r 




-1 


3 -1 




_-l 


2 0_ 




1 - X 


i -r 


-1 


3 — x -1 


-1 


2 


—x 



112 Determinants, Eigenvalues, and Similarity Transformations | III 

we obtain the characteristic polynomial det C(x) = — (x — l) 2 (x — 2). 
Thus we have just two distinct eigenvalues; X x = K = 1 with algebraic 
multiplicity two, and /l 3 = 2. 

Substituting A x for x in the characteristic matrix we obtain 



C(l) = 



" 
-1 
-1 



1 -1" 

2 -1 
2 -1 



The corresponding Hermite normal form is 

1 -1" 



1 -1 




Thus it is seen that the nullity of C(l) is 1 . The eigenspace S(l) is of dimension 
1 and the geometric multiplicity of the eigenvalue 1 is 1. This shows that the 
geometric multiplicity can be lower than the algebraic multiplicity. We obtain 

& = (1,1,1). 
The eigenvector corresponding to X z = 2 is £ 3 = (0, 1, 1). 



EXERCISES 



For each of the following matrices find all the eigenvalues and as many linearly 
independent eigenvectors as possible. 



1. 


"2 4" 
5 3 




3. 


"1 2" 




2 -2 



5. 



7. 



'4 


9 0" 




- 


-2 8 







7 




" 7 


4 -4" 


4 


-8 -1 


-4 


-1 - 


8 



2. 



6. 



3 

-2 


2" 
3 






1 


-V2 




V2 


4 




3 


2 


2" 


1 


4 


1 


-2 


-4 - 


-1 


2 


—i 


0" 


/ 


2 










9 


3 



7 | Similarity 



113 



7 I Similarity 

Generally, for a given linear transformation a we seek a basis for which 
the matrix representing a has as simple a form as possible. The simplest 
form is that in which the elements not on the main diagonal are zero, a 
diagonal matrix. Not all linear transformations can be represented by 
diagonal matrices, but relatively large classes of transformations can be 
represented by diagonal matrices, and we seek conditions under which 
such a representation exists. 

Theorem 7.1. A linear transformation a can be represented by a diagonal 
matrix if and only if there exists a basis consisting of eigenvectors of a. 

proof. Suppose there is a linearly independent set X = {f l9 . . . , f } of 
eigenvectors and that {A l5 . . . , X n ) are the corresponding eigenvalues. 
Then a(^) = A^ so that the matrix representing a with respect to the 
basis X has the form 



X x 
L 







(7.1) 



that is, cr is represented by a diagonal matrix. 

Conversely, if a is represented by a diagonal matrix, the vectors in that 
basis are eigenvectors. □ 

Usually, we are not given the linear transformation a directly. We are 
given a matrix A representing a with respect to an unspecified basis. In 
this case Theorem 7.1 is usually worded in the form: A matrix^ is similar 
to a diagonal matrix if and only if there exist n linearly independent eigen- 
vectors of A. In this form a computation is required. We must find the matrix 
P such that P~ X AP is a diagonal matrix. 

Let the matrix A be given; that is, A represents a with respect to some 
basis A = {a x , . . ., aj. Let f y = 2?=i/>««* be the representations of the 
eigenvectors of A with respect to A. Then the matrix A' representing a 
with respect to the basis X = {g lt ... , ^ n } is P^AP = A'. By Theorem 
7.1, A' is a diagonal matrix. 

In Example 1 of Section 6, the matrix 



A = 



-1 2 

2 2 

-3 -6 



114 



Determinants, Eigenvalues, and Similarity Transformations | III 



2 


1 


0" 




" 1 


1 


1" 


1 





1 


p-l = 


-1 


-2 


-2 





-1 


-1 


» 


1 


2 


1 



1 


1 


-1 


-1 


3 


-1 


-1 


2 






has three linearly independent eigenvectors, £ x = (2, — 1,0), £ 2 = 0> °> — 0» 
and f s = (0, 1, —1). The matrix of transition P has the components of 
these vectors written in its columns : 



P = 



The reader should check that P~ X AP is a diagonal matrix with the eigenvalues 
appearing in the main diagonal. 
In Example 2 of Section 6, the matrix 



A = 



has one linearly independent eigenvector corresponding to each of its two 
eigenvalues. As there are no other eigenvalues, there does not exist a set of 
three linearly independent eigenvectors. Thus, the linear transformation 
represented by this matrix cannot be represented by a diagonal matrix; 
A is not similar to a diagonal matrix. 

Corollary 7.2. If a can be represented by a diagonal matrix D, the elements 
in the main diagonal of D are the eigenvalues of a. U 

Theorem 7.3. If an n X n matrix has n distinct eigenvalues, then A is 
similar to a diagonal matrix. 

proof. By Theorem 5.6 the n eigenvectors corresponding to the n eigen- 
values of A are linearly independent and form a basis. By Theorem 7.1 
the matrix representing the underlying linear transformation with respect 
to this basis is a diagonal matrix. Hence, A is similar to a diagonal matrix. □ 

Theorem 7.3 is quite practical because we expect the eigenvalues of a 
randomly given matrix to be distinct; however, there are circumstances 
under which the theorem does not apply. There may not be n distinct 
eigenvalues, either because some have algebraic multiplicity greater than 
one or because the characteristic equation does not have enough solutions in 
the field. The most general statement that can be made without applying 
more conditions to yield more results is 

Theorem 7.4. A necessary and sufficient condition that a matrix A be 
similar to a diagonal matrix is that its minimum polynomial factor into distinct 
linear factors with coefficients in F. 



7 | Similarity 115 

proof. Suppose first that the matrix A is similar to a diagonal matrix D. 
By Theorem 5.3, A and D have the same characteristic polynomial. Since 
D is a diagonal matrix the elements in the main diagonal are the solutions 
of the chracteristic equation and the characteristic polynomial must factor 
into linear factors. By Theorem 4.4 the minimum polynomial for A must 
contain each of the linear factors of f(x), although possibly with lower 
multiplicity. It can be seen, however, either from Theorem 4.3 or by direct 
substitution, that D satisfies an equation without repeated factors. Thus, 
the minimum polynomial for A has distinct linear factors. 

On the other hand, suppose that the minimum polynomial for A is 
m(x) = {x - X x ){x - A 2 ) • • • {x - X p ) with distinct linear factors. Let 
M* be the kernel of a — X t . The non-zero vectors in M t are the eigenvectors 
of a corresponding to X t . It follows from Theorem 5.6 that a non-zero 
vector in M t cannot be expressed as a sum of vectors in ^ . M,. Hence, 
the sum M x + M 2 + (- M p is direct. 

Let v t = dim M„ that is, v i is the nullity of a - X t . Since M x • • • © 
M p c: V we have v x + • • • + v v ^ n. By Theorem 1.5 of Chapter II 
dim (a — XJV = n — v t = Pi , By another application of the same theorem 
we have dim (a - A,){(cr _ A 2 )V} > Pi - Vj = n - (v, + Vj ). 

Finally, by repeated application of the same ideas we obtain = 
dim m(a)V > n - (v x + • • • + Vp ). Thus, v x + • • • + v p = n. This shows 
that M 1 ®-"@M 9 = V. Since every vector in Visa linear combination of 
eigenvectors, there exists a basis of eigenvectors. Thus, A is similar to a 
diagonal matrix. □ 

Theorem 7.4 is important in the theory of matrices, but it does not provide 
the most effective means for deciding whether a particular matrix is similar 
to diagonal matrix. If we can solve the characteristic equation, it is easier 
to try to find the n linearly independent eigenvectors than it is to use Theorem 
7.4 to ascertain that they do or do not exist. If we do use this theorem and 
are able to conclude that a basis of eigenvectors does exist, the work done in 
making this conclusion is of no help in the attempt to find the eigenvectors. 
The straightforward attempt to find the eigenvectors is always conclusive 
On the other hand, if it is not necessary to find the eigenvectors, Theorem 7.4 
can help us make the necessary conclusion without solving the characteristic 
equation. 

For any square matrix A = [a tj ], Tr(A) = 2? =1 a u is called the trace of A. 
It is the sum of the elements in the diagonal of A. Since Tr{AB) = 
ItidU °iM = IjLiCZT-i V«) = Tr(BA), 

Tr(/>-MP) = Tr(APP-i) = Tr(A). (7.2) 

This shows that the trace is invariant under similarity transformations; 



116 



Determinants, Eigenvalues, and Similarity Transformations | III 



that is, similar matrices have the same trace. For a given linear transforma- 
tion a of V into itself, all matrices representing a have the same trace. Thus 
we can define Tr(cr) to be the trace of any matrix representing a. 

Consider the coefficient of x n ~ x in the expansion of the determinant of the 
characteristic matrix, 



#22 "^ 



#m 

a 2n 



— X 



(7.3) 



The only way an x n ~ x can be obtained is from a product of n — 1 of the 
diagonal elements, multiplied by the scalar from the remaining diagonal 
element. Thus, the coefficient of x n ~ x is (- l)"" 1 2?=i a u , or (- l) B - 1 Tr(,4). 
If/(at) = det (A — xl) is the characteristic polynomial of A, then det A = 
/(0) is the constant term of /(a;). If /(a) is factored into linear factors in the, 
form 

f{x) = (- l) n (x - Itf^x - A 2 )'» ■•■{x- X P Y\ (7.4) 

the constant term is YLLi K- Thus det A is the P roduct of the characteristic 
values (each counted with the multiplicity with which it is a factor of the 
characteristic polynomials). In a similar way it can be seen that Tr(^) is the 
sum of the characteristic values (each counted with multiplicity). 

We have now shown the existence of several objects associated with a 
matrix, or its underlying linear transformation, which are independent of 
the coordinate system. For example, the characteristic polynomial, the 
determinant, and the trace are independent of the coordinate system. 
Actually, this list is redundant since det A is the constant term of the char- 
acteristic polynomial, and Tr(^) is (- 1)" _1 times the coefficient of a;"- 1 of the 
characteristic polynomial. Functions of this type are of interest because they 
contain information about the linear transformation, or the matrix, and they 
are sometimes rather easy to evaluate. But this raises a host of questions. 
What information do these invariants contain? Can we find a complete 
list of non-redundant invariants, in the sense that any other invariant can 
be computed from those in the list? While some partial answers to these 
questions will be given, a systematic discussion of these questions is beyond 
the scope of this book. 



7 | Similarity 117 

Theorem 7.5. Let V be a vector space with a basis consisting of eigen- 
vectors of g. IfW is any subspace of V invariant under a, then W also has a 
basis consisting of eigenvectors of a. 

proof. Let a be any vector in W. Since V has a basis of eigenvectors 
of a, a can be expressed as a linear combination of eigenvectors of a. By 
disregarding terms with zero coefficients, combining terms corresponding 
to the same eigenvalue, and renaming a term like a^, where £ t is an eigen- 
vector and a t ^ 0, as an eigenvector with coefficient 1 , we can represent a in 
the form 

r 

a = 2 £<» 

where the £ f are eigenvectors of a with distinct eigenvalues. Let X { be the 
eigenvalue corresponding to | t . We will show that each ^ £ W. 

(a — l 2 )(oc — ^3) ' • ' (c — X r )(<x) is in W since W is invariant under a, 
and hence invariant under a — X for any scalar X. But then (a — X 2 )(a — X 3 ) 
• • • (a — A r )(a) = (A x — ^X^ — X 3 ) • • • (X x — A r )| x e W, and | x £ W since 
(A a — X 2 )(X 1 — X 3 ) • • • {X x — X r ) ?± 0. A similar argument shows that each 
f,eW. 

Since this argument applies to any a £ W, W is spanned by eigenvectors 
of (T. Thus, W has a basis of eigenvectors of a. D 

Theorem 7.6. Let V be a vector space over C, the field of complex numbers. 
Let a be a linear transformation of V into itself V has a basis of eigenvectors 
for a if and only if for every subspace S invariant under a there is a subspace T 
invariant under a such that V = S © 7. 

proof. The theorem is obviously true if V is of dimension 1. Assume 
the assertions of the theorem are correct for spaces of dimension less than n, 
where n is the dimension of V. 

Assume first that for every subspace S invariant under a there is a com- 
plementary subspace T also invariant under a. Since V is a vector space over 
the complex numbers a has at least one eigenvalue X ± . Let a 2 be an eigenvector 
corresponding to Xj. The subspace S 2 = <a x > is then invariant under a. 
By assumption there is a subspace 7^ invariant under a such that V = S x © Tj. 

Every subspace S 2 of T x invariant under Ra is also invariant under a. Thus 
there exists a subspace T 2 of V invariant under a such that V = S 2 © T 2 . 
Now S 2 c T x and 7 a = S 2 © (T 2 n TJ. (See Exercise 15, Section 1-4.) Since 
T 2 n 7 X is invariant under c, and therefore under Ra, the induction 
assumption holds for the subspace T x . Thus, Tj has a basis of eigenvectors, 
and by adjoining a x to this basis we obtain a basis of eigenvectors of V. 

Now assume there is a basis of V consisting of eigenvectors of a. By 
theorem 7.5 any invariant subspace S has a basis of eigenvectors. The method 



118 Determinants, Eigenvalues, and Similarity Transformations | III 

of proof of Theorem 2.7 of Chapter I (the Steinitz replacement theorem) 
will yield a basis of V consisting of eigenvectors of o, and this basis will con- 
tain the basis of S consisting of eigenvectors. The eigenvectors adjoined 
will span a subspace 7", and this subspace will be invariant under a and 
complementary to S. □ 

EXERCISES 

1. For each matrix A given in the exercises of Section 6 find, when possible, 
a non-singular matrix P for which P~ 1 AP is diagonal. 

"1 c 

where c ?* is not similar to a diagonal matrix 



2. Show that the matrix 



1 



3. Show that any 2x2 matrix satisfying x 2 + 1 = is similar to the matrix 

"0 -1" 

1 

4. Show that if A is non-singular, then AB is similar to BA. 

5. Show that any two projections of the same rank are similar. 

*8 I The Jordan Normal Form 

A normal form that is obtainable in general when the field of scalars is 
the field of complex numbers is known as the Jordan normal form. An 
application of the Jordan normal form to power series of matrices and sys- 
tems of linear differential equations is given in the chapter on applications. 
Except for these applications this section can be skipped without penalty. 

We assume that the field of scalars is the field of complex numbers. Thus 
for any square matrix A the characteristic polynomial/^) factors into linear 
factors, f(x) = (x — X-^ ri {x — A 2 ) r2 • — (x — X v ) r * where X t ^ A, for i ^j 
and r t is the algebraic multiplicity of the eigenvalue X t . The minimum poly- 
nomial m(x) for A is of the form m(x) = (x — X 1 ) Sl (x — A 2 ) S2 • • - (x — X p ) s v 
where 1 < ^ < r^ 

In the theorems about the diagonalization of matrices we sought bases 
made up of eigenvectors. Because we are faced with the possibility that 
such bases do not exist, we must seek proper generalizations of the eigen- 
vectors. It is more fruitful to think of the eigenspaces rather than the 
eigenvectors themselves. An eigenvalue is a scalar X for which the linear 
transformation a — X is singular. An eigenspace is the kernel (of positive 
dimension) of the linear transformation a — X. The proper generalization 
of eigenspaces turns out to be the kernels of higher powers of a — X. For a 
given eigenvalue X, let A4 fc be the kernel of (a — X) k . Thus, M° = {0} and M 1 



8 | The Jordan Normal Form 119 

is the eigenspace of X. For a e M k , (a - X)* +1 (a) = (a - X){a - Xf(<x) = 
(a - Xj(0j= 0. Hence, M fc c M k +K Also, for a e M fc+1 , (a - Xf(a - A) (a) = 
(a - A) fc+1 (a) = so that (a - A)(a) 6 M k . Hence, (a - A)/^ 1 c M*. 

Since all M* <= y and V is finite dimensional, the sequence of subspaces 
M° c M 1 cz M 2 c • • • must eventually stop increasing. Let f be the smallest 
index such that M k = M< for all k > t, and denote M* by M (A) . Let m k be the 
dimension of M k and w t the dimension of M (A) . 

Let ((T - XfV = W k . Then W k + l = (a - X) k+1 V = (a - ^)*{(ct - A)V} <= 
(a — X) k V = W k . Thus, the subspaces W k form a decreasing sequence 
VV^W 1 ^ W 2 =5 • • • . Since the dimension of W* is « — w fc , we see that 
W k = W for all k > t. Denote W* by W M) . 

Theorem 8.1. V is the direct sum of M (X) and W (A) . 

proof. Since (a - X)W* = (a - Xy+W = W^ 1 = W* we see that a - X 
is non-singular on W* = W (A) . Now let a be any vector in V. Then 
(a — A)'(a) = /5 is an element of W (A) . Because (c — A)' is non-singular on 
W (A) there is a unique vector y e W (A) such that (<r — A)'(y) = /?. Let 
a — y be denoted by (5. It is easily seen that d e M (A) . Hence V = M (A) + W U) . 
Finally, since dim M (A) = w, and dim W (A) = n — m t , the sum is direct. □ 

In the course of defining M k and W k we have shown that 

(1) (a - X)M k +* c M* c M*+i, 

(2) (<r - A)W fc = W^ 1 c W*. 

This shows that each M k and W fc is invariant under a — X. It follows 
immediately that each is invariant under any polynomial in a — X, and 
hence also under any polynomial in a. The use we wish to make of this 
observation is that if fx is any other eigenvalue, then a — jj, also maps 
M (A) and W (A) into themselves. 

Let A x , . . . , X v be the distinct eigenvalues of or. Let M { be a simpler 
notation for the subspace M (X . } defined as above, and let W, be a simpler 
notation for W (A )# 

Theorem 8.2. For A t - ^ X jf M< c W.. 

proof. Suppose a e M,. is in the kernel of a — A,-. Then 

(/, - A^'a = {(a - A,) - (a - *,)}*(a) 

= (a - A,)«(a) +|(_1)^W - A,)"- fc ((T - A,)*(a). 

The first term is zero because a e M i9 and the others are zero because a 
is in the kernel of a — A,-. Since X j — A, ^ 0, it follows that a = 0. This 
means that a — A 3 - is non-singular on M^, that is, a — Xj maps M. onto 



120 Determinants, Eigenvalues, and Similarity Transformations | III 

itself. Thus M i is contained in the set of images under (a — 1,)'% and hence 
M t c W,. □ 

Theorem 8.3. V = M ± © M 2 © • • • © M p . 

proof. Since V = M 2 © VV 2 and M 2 <= W lt we have V = M 2 V^ = 
Mi ® {M 2 ® (W 1 n W 2 )}. Continuing in the same fashion, we get V = 
M x © • • • © M p © {W l n • • • n W p }. Thus the theorem will follow if 
we can show that W = W x n • • • n W p = {0}. By an extension of remarks 
already made (<r - A x ) • • • (a — X v ) = q{a) is non-singular on W; that is, 
q(a) maps W onto itself. For arbitrarily large k, [q{o)f also maps W onto 
itself. But <7(x) contains each factor of the characteristic polynomial f(x) 
so that for large enough k, [q{x)f is divisible by f(x). This implies that 
W = {0}. □ 

Corollary 8.4. t t = Stfor i = 1, . . . , p. 

proof. Since V = M x © • • • © M p and (a — W vanishes on M i5 it 
follows that (a — X x )^ • • • (a — X v )^ vanishes on all of V. Thus (a; — X x ) tl 
• • • (x — Aj,)'" is divisible by the minimum polynomial and s t < t t . 

On the other hand, if for a single / we have s t < t t , there is an a e M i 
such that (a — A 2 ) Si (a) =^ 0. For all X 5 ^ A t , a — A, is non-singular on 
M { . Hence m(a) ^ 0. This is a contradiction so that t t = s t . D 

Let us return to the situation where, for the single eigenvalue X, M k is the 
kernel of (a - Xf and W k = (a - X) k V. In view of Corollary 8.4 we let 
s be the smallest index such that M k = M s for all k > s. By induction we 
can construct a basis {a l9 . . . , a m } of M x such that {a 1? . . . , a TO } is a basis 
of M k . 

We now proceed step by step to modify this basis. The set {a TO i+1 , . . . ,a m } 
consists of those basis elements in M s which are not in M s_1 . These 
elements do not have to be replaced, but for consistency of notation we 
change their names; let a m$ _ i+v = m§ _ i+ ,. Now set {a - X)(P m _ 1+ ,) = 
&._,+, an d consider the set {a l5 . . . , a Ws2 } u {(3 ms _ 2+1 , . . . , P ms _ 2+rils _ ms J. 
We wish to show that this set is linearly independent. 

If this set were linearly dependent, a non-trivial relation would exist and 
it would have to involve at least one of the & with a non-zero coefficient 
since the set {a x , . . ., a TOg _J is linearly independent. But then a non-trivial 
linear combination of the & would be an element of M $ ~ 2 , and (a — X) s ~ 2 
would map this linear combination onto 0. This would mean that (or — A) s_1 
would map a non-trivial linear combination of {<x m +1 , . . . , <x m } onto 0. 
Then this non-trivial linear combination would be in M s_1 , which would 
contradict the linear independence of {a l5 . . . , a }. Thus the set {a x , . . . , 
a ™ 8 _J u {&»,_,+!, • • • » P ms „ 2+ms -m s J is linearly independent. 

This linearly independent subset of M s_1 can be expanded to a basis of 
Ad s_1 We use /Ts to denote these additional elements of this basis, if any 



8 | The Jordan Normal Form 



121 



additional elements are required. Thus we have the new basis {a l5 . . . , 

We now set (a — A)(/9 TOs _ 2+ „) = P ms - 3 +v anc * P rocee d as before to obtain 
a new basis {a l5 . . . , a B( JU {/3 mj _ 3+ V • • • , P m ,J of M s ~ 2 . 

Proceeding in this manner we finally get a new basis {&,..., j5 TO } of 
M (A) such that {&, . . . , ftj is a basis of M« and (p - X)(P mk+v ) = ?l k _ 1+v 
for k > 1 . This relation can be rewritten in the form 



a (Pm k +v) ~ W mk+V + P mk _ 1+V 



for k > 1, 

for v < w v 



(8.1) 



Thus we see that in a certain sense P mk+V is "almost" an eigenvector. 

This suggests reordering the basis vectors so that {/?!, /5 TOi+1 , . . . , /S TOs _ i+1 } 
are listed first. Next we should like to list the vectors {/3 2 , p mi+2 , . . .}, etc. 
The general idea is to list each of the first elements from each section of the 
/Ts, then each of the second elements from each section, and continue until 
a new ordering of the basis is obtained. 

With the basis of M (A) listed in this order (and assuming for the moment 
that that A1 (A) is all of V) the matrix representing a takes the form 



s rows 



<s rows * 



~X 1 • • • 









A 1 • • • 









I ••• 













all zeros 


all zeros 


•• • X 


1 






• •• 


X 










X 1 ••• 








X ■■■ 




all zeros 




••• X 


all zeros 

i 


all zeros 




all zeros 


etc. 



122 



Determinants, Eigenvalues, and Similarity Transformations | III 



Theorem 8.5. Let A be a matrix with characteristic polynomial f{x) = 
(x — A 1 ) ri • ' • (x — 2. p ) r p and minimum polynomial m(x) = (x — A 2 ) Sl • • • 
(x — A P ) Sv . A is similar to a matrix J with submatrices of the form 



'k 


1 








K 


1 








K 



•• 


0" 


•• 





•• 





•• K 


1 


- 


K_ 



B< = 






along the main diagonal. All other elements of J are zero. For each X t there 
at least one B { of order s<. All other B { corresponding to this A t are of order 
less than or equal to s t . The number of B t corresponding to this A f is equal 
to the geometric multiplicity of X t . The sum of the orders of all the B t corre- 
sponding to X t is r t . While the ordering of the B t along the main diagonal 
of J is not unique, the number ofB t of each possible order is uniquely determined 
by A. J is called the Jordan normal form corresponding to A. 

proof. From Theorem 8.3 we have V = M x e • • • M„. In the dis- 
cussion preceding the statement of Theorem 8.5 we have shown that each 
M; has a basis of a special type. Since V is the sum of the M if the union of 
these bases spans V. Since the sum is direct, the union of these bases is 
linearly independent and, hence, a basis for V. This shows that a matrix 
J of the type described in Theorem 8.5 does represent a and is therefore 
similar to A. 

The discussion preceding the statement of the theorem also shows that 
the dimensions m ik of the kernels Mf of the various (a — X t ) k determine 
the orders of the B t in /. Since A determines a and a determines the subspace 
M 4 * independently of the bases employed, the B t are uniquely determined. 

Since the X i appear along the main diagonal of / and all other non-zero 
elements of / are above the main diagonal, the number of times x — A t 
appears as a factor of the characteristic polynomial of /is equal to the number 
of times X t appears in the main diagonal. Thus the sum of the orders of the 
Bi corresponding to 2. { is exactly r t . This establishes all the statements of 
Theorem 8.5. □ 

Let us illustrate the workings of the theorems of this section with some 
examples. Unfortunately, it is a little difficult to construct an interesting 



8 I The Jordan Normal Form 



123 



example of low order. Hence, we give two examples, The first example 
illustrates the choice of basis as described for the space M {X) . The second 
example illustrates the situation described by Theorem 8.3. 
Example 1. Let 







~ 1 0-1 1 


0~ 






-4 1-3 2 


1 




A = 


-2-1 1 


1 






-3 -1 -3 4 


1 






_-8 -2 -7 5 


4_ 




The first step is to obtain the characteristic matrix 






~\-x -1 1 


o - 




-4 1 -a; -3 2 


1 


C{x) = 


-2 -1 -x 1 


1 




-3 -1 -3 4 - x 


1 




_ ~ 8 


-2 -7 5 


4 


— x 



Although it is tedious work we can obtain the characteristic polynomial 
fix) = (x — 2) 5 . We have one eigenvalue with algebraic multiplicity 5. 
What is the geometric multiplicity and what is the minimum equation for 
A1 Although there is an effective method for determining the minimum 
equation, it is less work and less wasted effort to proceed directly with 
determining the eigenvectors. Thus, from 

-1 0-1 1 0" 

-4 -1-3 2 1 

C(2) = -2 -1 -2 1 1 

-3 -1-3 2 1 

-8 -2 -7 5 2_ 
we obtain by elementary row operations the Hermite normal form 




1 
-1 





124 Determinants, Eigenvalues, and Similarity Transformations | III 

From this we learn that there are two linearly independent eigenvectors 
corresponding to 2. The dimension of M 1 is 2. Without difficulty we find 
the eigenvectors 

a a = (0,-1,1,1,0) 

a 2 = (0, 1,0,0, 1). 
Now we must compute (A — 21 f = (C(2)) 2 , and obtain 

0" 



{A - 2/) 2 = 




-1 0-1 1 
-1 0-1 1 o 



1 



-1 



1 

The rank of {A - 2/) 2 is 1 and hence M 2 is of dimension 4. The a x and <x 2 
we already have are in M 2 and we must obtain two more vectors in M 2 
which, together with a x and a 2 , will form an independent set. There is 
quite a bit of freedom for choice and 

a 3 = (0,1,0,0,0) 

a 4 = (-1,0,1,0,0) 
appear to be as good as any. 

Now {A — 2/) 3 = 0, and we know that the minimum polynomial for A 
is (x — 2) 3 . We have this knowledge and quite a bit more more for less work 
than would be required to find the minimum polynomial directly. We see, 
then, that M 3 = V and we have to find another vector independent of a l5 
a 2 , a 3 , and a 4 . Again, there are many possible choices. Some choices will 
lead to a simpler matrix of transition than will others, and there seems to 
be no very good way to make the choice that will result in the simplest 
matrix of transition. Let us take 

<x 6 = (0,0,0,1,0). 
We now have the basis of {a,, a 2 , a 3 , a 4 , a 5 } such that {a 1? a 2 } is a basis 
of M 1 , {a x , a 2 , a 3 , a 4 } is a basis of M 2 , and {a l5 a 2 , a 3 , a 4 , a 5 } is a basis of 
M 3 . Following our instructions, we set /? 5 = <x 5 . Then 



(A- 


-21) 


~0~ 





~1~ 

2 









= 


1 






1 




* 






_0_ 




_5_ 



8 j The Jordan Normal Form 



125 



Hence, we set fa = (1, 2, 1, 2, 5). Now we must choose fa so that {a l3 
a 2> fa> &} is a basis for M 2 . We can choose fa = (—1,0, 1,0, 0). Then 



(A - 21) 



~r 




~0~ 


{A - 21) 


~-r 




~o 


2 














i 


1 


= 


1 




i 


= 





2 




1 












_5_ 




_1_ 


^ 


o_ 




1 



Hence, we choose fa = (0, 0, 1, 1, 1) and fa = (0, 1, 0, 0, 1). Thus, 



P = 



"0 


1 








-1" 





2 





1 





1 


1 








1 


1 


2 


1 








1 


5 





1 






is the matrix of transition that will transform A to the Jordan normal form 



Example 2. Let 



A = 





"2 


1 








0~ 









2 


1 










= 








2 





















2 


1 






_0 











2_ 




"5 


-1 




-3 




2 


-5" 





2 















1 







1 




1 


-2 





-1 









3 


1 


1 


-1 




-1 




1 


1 



The characteristic polynomial is f{x) = —(x — 2) 3 (x — 3) 2 . Again we have 



126 Determinants, Eigenvalues, and Similarity Transformations | HI 

repeated eigenvalues, one of multiplicity 3 and one of multiplicity 2. 



C(2) = 



0-1 

1 -1 -1 



from which we obtain the Hermite normal form 



"3 


-1 


-3 











1 





-1 



2 


-5~ 








1 


-2 


1 


1 


1 


-1_ 



1 





-1 





1 




































-2~ 





-1 


1 














0_ 



Again, the geometric multiplicity is less than the algebraic multiplicity. We 
obtain the eigenvectors 

a x = (1,0,1,0,0) 

a 2 = (2,1,0,0,1). 
Now we must compute (A — 2/) 2 . We find 





~1 





-1 





-2" 



















(A - 2/) 2 = 



















1 


-2 


-1 


2 







1 


-1 


-1 


1 


-1 



from which we obtain the Hermite normal form 
~l 0-1 -2" 






1 





-1 


-1 
















































8 I The Jordan Normal Form 



127 



For the third basis vector we can choose 

a 3 = (0, 1,0, 1,0). 



Then 



(A - 21) 



~0~ 




~1~ 


1 










= 


1 


1 







_0_ 




_0_ 



hence, we have /3 3 = a 3 , /3 X = a 1( and we can choose /? 2 = a 2 . 

In a similar fashion we find & = (-1, 0, 0, 1, 0) and & = (2, 0, 0, 0, 1) 
corresponding to the eigenvalue 3. /? 4 is an eigenvector and {A — 3I)fi 5 = |S 4 . 



chapter 



IV Linear 

functionals, 
bilinear forms, 
quadratic forms 



In this chapter we study scalar- valued functions of vectors. Linear functional 
are linear transformations of a vector space into a vector space of dimension 
1. As such they are not new to us. But because they are very important, they 
have been the subject of much investigation and a great deal of special 
terminology has accumulated for them. 

For the first time we make use of the fact that the set of linear transforma- 
tions can profitably be considered to be a vector space. For finite dimensional 
vector spaces the set of linear functionals forms a vector space of the same 
dimension, the dual space. We are concerned with the relations between 
the structure of a vector space and its dual space, and between the representa- 
tions of the various objects in these spaces. 

In Chapter V we carry the vector point of view of linear functionals one 
step further by mapping them into the original vector space. There is a 
certain aesthetic appeal in imposing two separate structures on a single 
vector space, and there is value in doing it because it motivates our con- 
centration on the aspects of these two structures that either look alike or 
are symmetric. For clarity in this chapter, however, we keep these two 
structures separate in two different vector spaces. 

Bilinear forms are functions of two vector variables which are linear in 
each variable separately. A quadratic form is a function of a single vector 
variable which is obtained by identifying the two variables in a bilinear 
form. Bilinear forms and quadratic forms are intimately tied together, 
and this is the principal reason for our treating bilinear forms in detail. 
In Chapter VI we give some applications of quadratic forms to physical 
problems. 

If the field of scalars is the field of complex numbers, then the applications 

128 



1 I Linear Functionals 129 

we wish to make of bilinear forms and quadratic forms leads us to modify 
the definition slightly. In this way we are led to study Hermitian forms. 
Aside from their definition they present little additional difficulty. 



1 I Linear Functionals 

Definition. Let V be a vector space over a field of constants F. A linear 
transformation cf> of V into F is called a linear form or linear functional on V. 

Any field can be considered to be a 1 -dimensional vector space over itself 
(see Exercise 10, Section 1-1). It is possible, for example, to imagine two 
copies of F, one of which we label U. We retain the operation of addition in 
U, but drop the operation of multiplication. We then define scalar multi- 
plication in the obvious way : the product is computed as if both the scalar 
and the vector were in the same copy of F and the product taken to be an 
element of U. Thus the concept of a linear functional is not really something 
new. It is our familiar linear transformation restricted to a special case. 
Linear functionals are so useful, however, that they deserve a special name 
and particular study. Linear concepts appear throughout mathematics 
particularly in applied mathematics, and in all cases linear functionals play 
an important part. It is usually the case, however, that special terminology 
is used which tends to obscure the widespread occurrence of this concept. 

The term "linear form" would be more consistent with other usage 
throughout this book and the history of the theory of matrices. But the 
term "linear functional" has come to be almost universally adopted. 

Theorem 1.1. If V is a vector space of dimension n over F, the set of all 
linear functionals on V is a vector space of dimension n. 

proof. If <f> and ip are linear functionals on V, by <f> + tp we mean the 
mapping defined by (<f> + ^)(a) = <£(a) + y>(a) for all a e V. For any 
a e F, by a<f> we mean the mapping defined by (a<£)(a) = a [<£(<*)] for all 
a e V. We must then show that with these laws for vector addition and 
scalar multiplication of linear functionals the axioms of a vector space are 
satisfied. 

These demonstrations are not difficult and they are left to the reader. 
(Remember that proving axioms Al and B\ are satisfied really requires 
showing that <j> + y> and a<j>, as defined, are linear functionals.) 

We call the vector space of all linear functionals on V the dual or conjugate 
space of V and denote it by V (pronounced "vee hat" or "vee caret"). We have 
yet to show that Vis of dimension n. Let A = {a l5 a 2 , . . . , a w } be a basis of V. 
Define fa by the rule that for any a = JjLi **<**, &( a ) = a i e F - We sha11 
call <f>i the ith coordinate function. 



130 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

For any /8 = £Li b^ we have <f> t (P) = b, and &(a + 0) = &2JU ^ + 
2?=i V*> = <MIti («* + ^K> = a< + ^ = &(a) + &(£)• Also &(aa) = 
&{ fl X"=i a y a /) = &Gt£=i «^ a ^} = «o» = ^t(a)- Thus <^ is a linear func- 
tional. 

Suppose that 2» =1 Z>^. = 0. Then Q> =1 ^)(a) = for all a e V. 
In particular for a< we have (2" =1 ^XaJ = 2*=xM*( a <) = *< = °- Hence, 
all Z) t = and the set {<{>!, <f> 2 , ■ ■ ■ , <f> n ) must be linearly independent. On the 
other hand, for any <£ e V and any a = JjL x «*<** e V, we have 

In \ n 

If we let ^(aj = 6 if then for ^J=i ^</v we have 

(iM/W) = 2 MX") = 2 My = #«)• (L2) 

\y=i / y=i y=i 

A A A. 

Thus the set {<f> x , . . . , n } = A spans V and forms a basis of V. This shows 
that V is of dimension n. □ 

A. A. 

The basis A of V that we have constructed in the proof of Theorem 1.1 
has a very special relation to the basis A. This relation is characterized by 
the equations 

4>iM = tii> (1-3) 

for all i, j. In the proof of Theorem 1 . 1 we have shown that a basis satisfying 
these conditions exists. For each i, the conditions in Equation (1.3) specify 
the values of ^> t on all the vectors in the basis A. Thus </>; is uniquely deter- 
mined as a linear functional. And thus A is uniquely determined by A and the 
conditions (1.3). We call A the basis dual to the basis A. 

n 

<K*t) = 2 bdlcLj) = b, 

i=l 

so that, as a linear transformation, <f> is represented by the 1 x n matrix 
\b x ' * * b n ]. For this reason we choose to represent the linear functionals in 

A A A 

V by one-row matrices. With respect to the basis A in V, <f> = 2?=i b^i w iU 
be represented by the row [b x • • • b n ] = B. It might be argued that, since V 
is a vector space, the elements of V should be represented by columns. But 
the set of all linear transformations of one vector space into another also 
forms a vector space, and we can as justifiably choose to emphasize the aspect 
of V as a set of linear transformations. At most, the choice of a representing 
notation is a matter of taste and convenience. The choice we have made 
means that some adjustments will have to be made when using the matrix 



1 I Linear Functionals 



131 



of transition to change the coordinates of a linear functional when the basis 
is changed. But no choice of representing notation seems to avoid all such 
difficulties and the choice we have made seems to offer the most advantages. 
If the vector £ e V is represented by the n-tuple (x lt . . . , x n ) = X, then 
we can compute <£(£) directly in terms of the representations. 



n n 



3 = 1 



= [b t ■■•b n ] 



= BX. 



(1.4) 



EXERCISES 

1. Let A = {a l5 <x 2 , a 3 } be a basis in a 3-dimensional vector space V over R. 
Let A ={<f> x , <f> 2 , <f> 3 } be the basis in V dual to A. Any vector I G V can be written in 
the form £ = x 1 a 1 + z 2 a 2 + x a a. 3 . Determine which of the following functions 
on V are linear functionals. Determine the coordinates of those that are linear 

.A. 

functionals in terms of the basis A. 

(a) <£(£) = *i + x 2 + x z- 

(b) $(!;) =(x 1 _ + x 2 )\ 

(c) <£(!) = ^2x v 
{d) <£(£) =x 2 - %x v 

(e) m =x 2 -l 

A 

2. For each of the following bases of R 3 determine the dual basis in R 3 . 

(a) {(1,0,0), (0,1,0), (0,0, 1)}. 

(b) {(1,0,0), (1,1,0), (1,1,1)}. 

(c) {(1,0, -1), (-1,1,0), (0,1,1)}. 

3. Let V = P n , the space of polynomials of degree less than n over R. For a 
fixed aeR, let </>(/?) = p ik) (a), where p ( ^(x) is the fcth derivative of p(x)eP n . 
Show that <f> is a linear functional. 

4. Let V be the space of real functions continuous on the interval [0, 1], and let 
g be a fixed function in V. For each/e V define 



Jo 



fWOdt. 



132 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Show that L g is a linear functional on V. Show that if L g {f) = for every geV, 
then/ = 0. 

^ 5. Let A = {04, . . . , a n } be a basis of V and let A = {^, . . . , <f> n } be the basis of 

V dual to the basis A. Show that an arbitrary a g V can be represented in the form 

n 
a = 2 &( a ) a i- 

6. Let V be a vector space of finite dimension n > 2 over F. Let a and /S be two 
vectors in V such that {a, p) is linearly independent. Show that there exists a 
linear functional <f> such that <£(<*) = 1 and <f>(P) = 0. 

7. Let V = P n , the space of polynomials over F of degree less than n(n > 1). Let 
a g F be any scalar. For each p(x) e P n , p(a) is a scalar. Show that the mapping 
of p(x) onto p(a) is a linear functional on P„ (which we denote by a a ). Show that if 
a 7^ b then <r a ^ cr b . 

8. (Continuation) In Exercise 7 we showed that for each ae F there is defined 

A. 

a linear functional <r a e P n . Show that if « > 1 , then not every linear functional in 
P n can be obtained in this way. 

9. (Continuation) Let {a x , . . . , a n } be a set of n distinct scalars. Let/0») = 
(x - a x ){x - a 2 ) • • • (x - a n ) and h k (x) =f(x) =f(x)/(x - a k ). Show that h k {a 3 ) = 
^ikf'i^j), where/'(#) is the derivative off(x). 

10. (Continuation) For the a k given in Exercise 9, let 

1 



3 /'K> °'" 

Show that {ct x , . . . , a n } is linearly independent and a basis of P n . Show that 
{h x {x), . . . , /?„(x)} is linearly independent and, hence, a basis of P n . (Hint: Apply 
a j to 2fc=i ^fc^/cC^)-) Show that {ffj, . . . , a n ) is the basis dual to {h^ix), . . . , /? n (a;)} 

11. (Continuation) Let p(x) be any polynomial in P n . Show that p(x) can be 
represented in the form 

« /?(a fc ) 

*;=i / ^^ 

(Hint: Use Exercise 5.) This formula is known as the Lagrange interpolation 
formula. It yields the polynomial of least degree taking on the n specified values 
{p(a 1 ), . . . , p(a n )} at the points {a lf . . . , a n ). 

12. Let W be a proper subspace of the »-dimensional vector space V. Let a be 
a vector in V but not in W. Show that there is a linear functional <£ G y such that 
<A(a ) = 1 and 0(a) = for all a g W. 

13. Let W be a proper subspace of the M-dimensional vector space V. Let y> 
be a linear functional on W. It must be emphasized that y> is an element of W 



2 | Duality 133 

a a 

and not an element of V. Show that there is at least one element <f> e V such that 

<f> coincides with y> on W. 

14. Show that if a ^ 0, there is a linear functional <f> such that <f>(a) ^ 0. 

15. Let a and /S be vectors such that <£(/?) = implies <f>(a) = 0. Show that a is 
a multiple of /3. 

2 I Duality 

Until now, we have encouraged an unsymmetric point of view with respect 
to V and V. Indeed, it is natural to consider <£(<x) for a chosen <f> and a range 
of choices for a. However, there is no reason why we should not choose a 
fixed a and consider the expression <£(a) for a range of choices for <f>. Since 
iPi^i + ^2^2)( a ) = (£i<£i)(a) + (6 2 <£ 2 (a), we see that a behaves like a linear 
functional on V. 

This leads us to consider the space V of all linear functional on V. Corre- 
sponding to any a £ V we can define a linear functional a in V by the rule 
a(<£) = <j>((x.) for all <f> £ V. Let the mapping defined by this rule be denoted 
by J, that is, 7(a) = a. Since /(aa + bfi)(<f>) = <f>(a<x. + fyff) = fl^(a) + 
60(0) = aJ(a)(<f>) + &/(j8)(0) = [aJ(a) + bJ(p)](cf>) we see that / is a linear 

transformation mapping V into 9. 

Theorem 2.1. If V is finite dimensional, the mapping J of V into V is a 

one-to-one linear transformation of V onto V. 

proof. Let V be of dimension n. We have already shown that J is linear 
and into. If 7(a) = then /(a)(0) = for all <f> £ V. In particular, 7(a)(^) = 
for the basis of coordinate functions. Thus if a = 2X X a i x i we see tnat 

J(a)(6) = &(a) = 5>,6(a,) = a, = 
for each / = 1, . . . , n. Thus a = and the kernel of / is {0}, that is, J{V) 

A it 

is of dimension n. On the other hand, if V is of dimension n, then V and V 
are also of dimension n. Hence J(V) = V and the mapping is onto. □ 

If the mapping J of V into V is actually onto V we say that V is reflexive. 
Thus Theorem 2.1 says that a finite dimensional vector space is reflexive. 
Infinite dimensional vector spaces are not reflexive, but a proof of this 
assertion is beyond the scope of this book. Moreover, infinite dimensional 
vector spaces of interest have a topological structure in addition to the 
algebraic structure we are studying. This additional condition requires 
a more restricted definition of a linear functional. With this restriction 



134 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

the dual space is smaller than our definition permits. Under these condition 
it is again possible for the dual of the dual to be covered by the mapping J. 
Since J is onto, we identify V and J(V), and consider V as the space of 
linear functionals on V. Thus V and V are considered in a symmetrical 
position and we speak of them as dual spaces. We also drop the parentheses 
from the notation, except when required for grouping, and write </><x instead 
of </>(<x). The bases {a. x , . . . , a. n } and {(/>!,..., <f> n } are dual bases if and only 
if ^a, = 6 it . 



EXERCISES 

1. Let A = {oL lf . . . , a„} be a basis of V, and let A = {<f> lt . . . , <£ n } be the basis of 

A A. 

V dual to the basis A. Show that an arbitrary <f>eV can be represented in the form 

n 

2. Let V be a vector space of finite dimension n > 2 over F. Let </> and y> be two 
linear functionals in V such that {<f>, y>} is linearly independent. Show that there 
exists a vector a such that <£(a) = 1 and y(a) = 0. 

3. Let <f> be a linear functional not in the subspace S of the space of linear 

functionals V. Show that there exists a vector a such that <£ ( a ) = 1 and <£(<x) =0 
for all <f>eS. 

4. Show that if <f> ¥> 0, there is a vector a such that <£(a) ^ 0. 

5. Let <f> and y be two linear functionals such that <f>{a) = implies tp(cc) = 0. 
Show that v is a multiple of <f>. 

3 I Change of Basis 

If the basis A' = {04, x' 2 , . . . , a'J is used instead of the basis A = 
{a l5 a 2 , . . . , a n }, we ask how the dual basis A' = {<f>[, . . . , <f>' n } is related to 
the dual basis A = {</> l5 . . . , cf> n }. Let i> = [p {j ] be the matrix of transition 
from the basis A to the basis A'. Thus a J = ^Li/?^. Since </>j(a 3 ') = 
SU/VMO =/>« we see that ^= ^xPa^r A This means that P T is the 
matrix of transition from the basis A' to the basis A. Hence, (P 2 ') -1 = (P~ X ) T 

A A 

is the matrix of transition from A to A'. 

Since linear functionals are represented by row matrices instead of column 
matrices, the matrix of transition appears in the formulas for change of 
coordinates in a slightly different way. Let B = [b x ■ • • b n ] be the representa- 
tion of a linear functional <f> with respect to the basis A and B' = [b[ • • • b' n ] 



3 | Change of Basis 135 

be its representation with respect to the basis A'. Then 



n in 



=i(i^A^;- (3-D 

Thus, 

B' = BP. (3.2) 

We are looking at linear functionals from two different points of view. 
Considered as a linear transformation, the effect of a change of coordinates 
is given by formula (4.5) of Chapter II, which is identical with (3.2) above. 
Considered as a vector, the effect of a change of coordinates is given by 
formula (4.3) of Chapter II. In this case we would represent 4> by B T , since 
vectors are represented by column matrices. Then, since (P _1 ) r is the 
matrix of transition, we would have 

b t = (p-yB ,T = (B'p-y, 

or (3.3) 

B = B'P~\ 

which is equivalent to (3.2). Thus the end result is the same from either 
point of view. It is this two-sided aspect of linear functionals which has 
made them so important and their study so fruitful. 

Example 1. In analytic geometry, a hyperplane passing through the 
origin is the set of all points with coordinates (x x , x 2 , . . . , x n ) satisfying an 
equation of the form b^ + b 2 x 2 + • • • + b n x n = 0. Thus the «-tuple 
[b x b 2 • • • b n ] can be considered as representing the hyperplane. Of course, 
a given hyperplane can be represented by a family of equations, so that 
there is not a one-to-one correspondence between the hyperplanes through 
the origin and the w-tuples. However, we can still profitably consider the 
space of hyperplanes as dual to the space of points. 

Suppose the coordinate system is changed so that points now have the 
coordinates (y l5 . . . , y n ) where x t = 2" =i aaVy Then the equation of the 
hyperplane becomes 



n n 




- n 


- 


2 b i x i = 1 b i 


^aaVi 


i=l i=l 


_3' =1 


n 


- n 




= 1 


2 b i a a 


y, 


i=1 


_*'=! 




n 


= 2^, = o. 


j=l 









(3.4) 



136 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Thus the equation of the hyperplane is transformed by the rule c, = 2*=i ^i a a- 
Notice that while we have expressed the old coordinates in terms of the new 
coordinates we have expressed the new coefficients in terms of the old 
coefficients. This is typical of related transformations in dual spaces. 

Example 2. A much more illuminating example occurs in the calculus of 
functions of several variables. Suppose that w is a function of the variables 
#!, x 2 , . . . , x n , w =f(x x , x 2 , . . . , x n ). Then it is customary to write down 
formulas of the following form: 

, dw , dw , . 9w , ' 

dw = ax x H dx 2 + • • • -\ dx n , (3.5) 

dx x dx 2 dx n 

and 

\dxx ' dx 2 ' ' dxj 

dw is usually called the differential of w, and Viv is usually called the gradient 
of w. It is also customary to call Viv a vector and to regard dw as a scalar, 
approximately a small increment in the value of w. 

The difficulty in regarding Vw as a vector is that its coordinates do not 
follow the rules for a change of coordinates of a vector. For example, let 
us consider (x x , x 2 , . . . , x n ) as the coordinates of a vector in a linear vector 
space. This implies the existence of a basis {a 1? . . . , a n } such that the linear 
combination 

I = 2 ^ (3-7) 

is the vector with coordinates (x x , x 2 , . . . , x n ). Let {fix, . . . , /?„} be a new 
basis with matrix of transition P = [p i} ] where 

n 

Pi = lPifi<- (3-8) 

Then, if £ = 2" =1 2/*& is the representation of | in the new coordinate system, 
we would have 

n 

x i=^ViiV^ (3-9) 

or 

*« = i:r^ (3-10) 

Let us contrast this with the formulas for changing the coordinates of Vw. 
From the calculus of functions of several variables we know that 

dw ~ dw dx i 

— =2, • (3-H) 

dy, i=idx { dy 5 



3 | Change of Basis I37 

This formula corresponds to (3.2). Thus Vw changes coordinates as if it were 
in the dual space. 

In vector analysis it is customary to call a vector whose coordinates change 
according to formula (3.10) a contravariant vector, and a vector whose 
coordinates change according to formula (3.11) a covariant vector. The 



reader should verify that if P = 



~dx l 
Thus (3.11) is equivalent to the formula 



then 



'dy t ' 



-dx 4 _ 



= (P T )~\ Thus 



dw ^ dy* dw 

r=2rT' 3 - 12 ) 

From the point of view of linear vector spaces it is a mistake to regard 
both types of vectors as being in the same vector space. As a matter of fact, 
their sum is not defined. It is clearer and more fruitful to consider the co- 
variant and contravariant vectors to be taken from a pair of dual spaces. 

This point of view is now taken in modern treatments of advanced calculus 
and vector analysis. Further details in developing this point of view are given 
in Chapter VI, Section 4. 

In traditional discussions of these topics, all quantities that are represented 
by n-tuples are called vectors. 

In fact, the «-tuples themselves are called vectors. Also, it is customary 
to restrict the discussion to coordinate changes in which both covariant 
and contravariant vectors transform according to the same formulas. This 
amounts to having P, the matrix of transition, satisfy the condition (P- X ) r = 
P. While this does simplify the discussion it makes it almost impossible to 
understand the foundations of the subject. 

Let A = {«!, . . . , a J be a basis of V and let A = {<f> x , ...,<£ J be the dual 
basis in V. Let 8 = {fi lt . . . , pj be any new basis of V. We are asked to find 
the dual basis /S in V. This problem is ordinarily posed by giving the repre- 
sentation of the fa with respect to the basis A and expecting the representations 
of the elements of the dual basis with respect of A. Let the fa be represented 
with respect to A in the form 

n 

Pi = I,Pii<Xi> (3.13) 

and let 



*Pi=2qif<f>i (3.14) 

3 = 1 

be the representations of the elements of the dual bases B = {y^, . . . , y>J. 



138 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Then 



<5*i = Vtft = ( 2 «*<&)( %Pn*\ 



n n 

i=l j=l 

n 

= J f 1kiPa- (3-15) 

i=l 

In matrix form, (3.15) is equivalent to 

I=QP. (3.16) 

(2 is the inverse of P. Because of (3.15), the ip t are represented by the rows 
of Q. Thus, to find the dual basis, we write the representation of the basis 8 
in the columns of P, find the inverse matrix P _1 , and read out the repre- 

A. 

sentations of the basis 6 in the rows of P~ x . 

EXERCISES 

1. Let A = {(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, 0, . . . , 1)} be a basis of R n . 

The basis of R n dual to A has the same coordinates. It is of interest to see if there 
are other bases of R n for which the dual basis has excatly the same coordinates. 
Let A' be another basis of R n with matrix of transition P. What condition should 
P satisfy in order that the elements of the basis dual to A' have the same coordinates 
as the corresponding elements of the basis A' ? 

2. Let A = {a l9 a 2 , a 3 } be a basis of a 3-dimensional vector space V, and let A = 

{<t>i, K <&}} be the basis of V dual to A. Then let A' ={(1,1, 1), (1, 0, 1), (0, 1, -1)} 
be another basis of V (where the coordinates are given in terms of the basis A). 
Use the matrix of transition to find the basis A' dual to A'. 

3. Use the matrix of transition to find the basis dual to {(1,0,0), (1, 1,0), 
(1,1,1)}. 

4. Use the matrix of transition to find the basis dual to {(1, 0, —1), ( — 1, 1, 0), 
(0,1,1)}. 

5. Let B represent a linear functional $, and X a vector £ with respect to dual 
bases, so that BX is the value <f>$ of the linear functional. Let P be the matrix of 
transition to a new basis so that if X' is the new representation of I, then X = PX'. 
By substituting PX' for X in the expression for the value of <f>£ obtain another 
proof that BP is the representation of <f> in the new dual coordinate system. 

4 I Annihilators 

A. 

Definition. Let V be an w-dimensional vector space and V its dual. If, for 

A. 

an a g V and a <f> e V, we have <f>a. = 0, we say that <f> and a are orthogonal. 



4 I Annihilators 



139 



Since <f> and a are from different vector spaces, it should be clear that we do 
not intend to say that the <f> and a are at "right angles." 

Definition. Let W be a subset (not necessarily a subspace) of V. The set of 
all linear functional <f> such that <£a = for all a e W is called the annihilator 
of W, and we denote it by V\A. Any <f> e V\A is called «« annihilator of W. 

Theorem 4.1. The annihilator W x o/ W is a subspace of V. If W is a 
subspace of dimension p, then W 1 is of dimension n — p. 

proof. If <f> and tp are in W 1 , then (a^ + by>)a. = a <f>* + biptx. = for 
all a e W. Hence, W L is a subspace of V. 

Suppose W is a subspace of V of dimension p, and let A = {a lf . . . , a B } 
be a basis of V such that {a l5 . . . , a p } is a basis of W. Let A = {fa, . . . , <£J 
be the dual basis of A. For {<f> p+1 , . . . , fa} we see that ^a, = fo'r all 
i < p. Hence, {0 lH . 1 , . . . , fa} is a subset of the annihilator of W. On the 
other hand, if <j> = £ n =1 brf> t is an annihilator of W, we have 0<x, = for 
each / < p. But fa = J; =1 ^ a , = 6,. Hence, b t = for i ^ p and the 
set {fa +1 , . . . , fa} spans W^. Thus {<£ p+1 , . . . , fa} is a basis for W^, 
and W x is of dimension n - p. The dimension of W^ is called the co- 
dimension of W. □ 

It should also be clear from this argument that W is exactly the set of all 
a e V annihilated by all <£ e W- 1 . Thus we have 

Theorem 4.2. If S is any subset of V, the set of all a. e V annihilated by all 
<f>eSis a subspace of V, denoted by S±. If S is a subspace of dimension r, 
then S 1 is a subspace of dimension n — r. □ 

Theorem 4.2 is really Theorem 1.16 of Chapter II in a different form. If 
a linear transformation of V into another vector space W is represented by 
a matrix A, then each row of A can be considered as representing a linear 
functional on V. The number r of linearly independent rows of A is the 
dimension of the subspace S of V spanned by these linear functional. S 1 
is the kernel of the linear transformation and its dimension is n — r. 

The symmetry in this discussion should be apparent. If <f> e W 1 , then 
<f>a. = for all a e W. On the other hand, for a e W, <£a = for all (f>eW^. 

Theorem 4.3. If W is a subspace, (W^ = W. 

proof. By definition, (W^ = W±± is the set of a e V such that 
<A<x = for all <f> e W^. Clearly, W c W 11 . Since dim W 11 = n - 
dim W-l = dim W, W ±x = W. □ 

This also leads to a reinterpretation of the discussion in Section II-8. 
A subspace W of V of dimension p can be characterized by giving its 
annihilator W 1 - c V of dimension /• = n ~ p. 



140 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Theorem 4.4. If W x and W 2 are two subspaces of V, and VV 1 and VV 1 
are their respective annihilators in V, the annihilator of W 1 + VV 2 is VV 1 n 
W 2 ± and the annihilator of W x n VV 2 is W 1 + VV 1 . 

proof. If <f> is an annihilator of VV X + W l5 then <f> annihilates all a g W x 
and all g W 2 so that </> g VV 1 n VV 1 . If e VV 1 n VV 1 , then for all 
a g VV X and /S g VV 2 we have </><x = and </>/? = 0. Hence, (f>(aa. + 6/?) = 
a<£oc + b<f>p = so that annihilates W^ + VV 2 . This shows that (W x + 
w,) 1 = VV 1 n VV 1 . 

The symmetry between the annihilator and the annihilated means that 
the second part of the theorem follows immediately from the first. Namely, 
since (W x + VVj) 1 = VV 1 n VV 1 , we have by substituting VV 1 and W 1 
for W x and W 2 , (VV 1 + VV 1 ) 1 = (VV 1 ) 1 n (VV 1 ) 1 = VV X n VV>. Hence, 
(W 1 n VV,) 1 = VV 1 + VV 1 . n 

Now the mechanics for finding the sum of two subspaces is somewhat 
simpler than that for finding the intersection. To find the sum we merely 
combine the two bases for the two subspaces and then discard dependent 
vectors until an independent spanning set for the sum remains. It happens 
that to find the intersection W x n W 2 it is easier to find VV 1 and W 1 
and then VV 1 + VV 1 and obtain W x n W 2 as (VV 1 + VV 1 ) 1 , than it is to 
find the intersection directly. 

The example in Chapter II-8, page 71, is exactly this process carried out 
in detail. In the notation of this discussion £ x = W 1 and £ 2 = VV 1 . 

A 

Let V be a vector space, V the corresponding dual vector space, and let VV 
be a subspace of V. Since VV <= V, is there any simple relation between VV 
and V? There is a relation but it is fairly sophisticated. Any function defined 
on all of V is certainly defined on any subset. A linear functional <f> e V, 
therefore, defines a function on VV, which we have called the restriction of <j> 

A A 

to VV. This does not mean that V <= VV; it means that the restriction defines 

A A 

a mapping of V into VV. 

Let us denote the restriction of </> to VV by <j>, and denote the mapping of <f> 
onto 4> by R. We call R the restriction mapping. It is easily seen that R is 
linear. The kernel of R is the set of all <f> g V such that </>(a) = for all 
a g VV. Thus K(R) = VV 1 . Since dim VV = dim VV = n - dim VV 1 = 
n — dim K(R), the restriction map is an epimorphism. Every linear functional 
on VV is the restriction of a linear functional on V. 

A A 

Since K(R) = VV 1 , we have also shown that VV and V/VV 1 are isomorphic. 
But two vector spaces of the same dimension are isomorphic in many ways. 
We have done more than show that VV and V/VV 1 are isomorphic. We have 
shown that there is a canonical isomorphism that can be specified in a natural 
way independent of any coordinate system. If <f> is a residue class in V/W 1 , 



4 | Annihilators j4j 

and <f> is any element of this residue class, then $ and R(<f>) correspond under 
this natural isomorphism. If rj denotes the natural homomorphism of 
V onto V/W 1 , and t denotes the mapping of 4> onto R(<f>) defined above, 
then R = Tt], and t is uniquely determined by R and t] and this relation. 

Theorem 4.5.^ Let W be a subspace of V and let W- 1 be the annihilator of 
W in V.^ ThenW is isomorphic to V/W 1 . Furthermore, if R is the restriction 
map of \ onto W, ifrj is the natural homomorphism of V onto V/W 1 , and r is the 
unique isomorphism ofV/W^ onto W characterized by the condition R = rrj, 
then r{<f>) = R(cf>) where <j> is any linear functional in the residue class $ e 
V/W ± . a 

EXERCISES 

1 . (a) Find a basis for the annihilator of W = <(1 , 0, - 1), (1 , - 1 , 0), (0, 1 , - 1)>. 
(b) Find a basis for the annihilator of W = <(1, 1, 1, 1,1), '(1, 0, 1,0, 1)' (0 1 
1,1,0), (2, 0, 0, 1, 1), (2, 1,1,2, 1), (1,-1,-1, -2, 2), (1, 2, 3, 4, -1)>. What 
are the dimensions of W and W^ ? 

2. Find a non-zero linear functional which takes on the same non-zero value 
for ^ = (1, 2, 3), £ 2 = (2, 1, 1), and f 8 = (1, 0, 1). 

3. Use an argument based on the dimension of the annihilator to show that if 
a * 0, there is a <f> e V such that </>«. # 0. 

4. Show that if S c 7", then S x = 7^. 

5. Show that <S> = S^- 1 . 

6. Show that if S and T are subsets of V each containing 0, then 

(S + T)-L <= SJ- n T-L, 
and 

S 1 + T 1 c (S n T)-L. 

7. Show that if S and T are subspaces of V, then 

(S + 7> = S^ n T-L, 
and 

S 1 + T-L = (S n T)-L. 

8. Show that ifS and T are subspaces of V such that the sum S + T is direct, 
then S- 1 - + T-L = y. 

9. Show that if S and T are subspaces of V such that S + T = V, then S-L n 
T± = {0}. 

10. Show that if S and T are subspaces of V such that S ®T = V, then V = 
S-L T-L. Show that S-L is isomorphic to T and that T-L is isomorphic to S. 

11. Let V be vector space over the real numbers, and let tf> be a non-zero linear 
functional on V. We refer to the subspace S of V annihilated by ^ as a hyperplane 
of V. Let S+ = {a | 0(«) > 0}, and S~ = {a | 0(a) < 0}. We call S+ and S~ the two 



142 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

sides of the hyperplane S. If a and P are two vectors, the line segment joining a 
and (3 is defined to be the set {to. + (1 — t)fi | <, t < 1}, which we denote by 
a/?. Show that if a and /? are both in the same side of S, then each vector in a/S is 
also in the same side. And show that if a and /? are in opposite sides of S, then a/? 
contains a vector in S. 

5 I The Dual of a Linear Transformation 

Let U and V be vector spaces and let a be a linear transformation mapping 
U into V. Let V be the dual space of V and let be a linear functional on V. 
For each «eU, (r(<x) g V so that can be applied to (r(a). Thus (/>[cr(a)] £ F 
and (f>a can be considered to be a mapping which maps U into F. For 
a, |8 e 1/ and a,b eF we have <£[(r(aa + 6/?)] = </>|>(7(a) + 6<x(#)] = 
a^cr(a) + bcf)o({3) so that we have shown 

A. 

Theorem 5.1. For a a linear transformation of U into V, and cf> eV, the 
mapping <f>a defined by <f>[c(<x)] = <f>a((x) is a linear functional on U; that is, 
(f>a eil. □ 

Theorem 5.2. For a given linear transformation a mapping U into V, the 
mapping of V into U defined by making <j> e V correspond to <f>o g U is a linear 
transformation of V into U. 

proof. For l5 4> 2 g V and a, b e F, {a<f> x + b(f> 2 )o(<x) — A0 1 cr(a) + txj> 2 a{cf) 
for all a e U so that a§\ + b(f> 2 in V is mapped onto a<f> r a + b(f> 2 o e U and 
the mapping defined is linear. □ 

A. A. 

Definition. The mapping of c/> e V onto <f>o g U is called the dual of a and 
is denoted by a. Thus a{<f>) — <f>a. 

Let A be the matrix representing a with respect to the bases A in U and 8 

A A. A A 

in V. Let A and B be the dual bases in U and V, respectively. The question 
now arises: "How is the matrix representing a with respect to the bases 

A A 

8 and A related to the matrix representing a with respect to the bases A and 8 ?" 

For A = {a l5 . . . , a TO } and 8 = {fi lt . . . , n } we have <r(a,) = 2f =1 a w ft. 

Let {A, . . . , (£ TO } be the basis of U dual to A and let {y) x , . . . , y n ) be the basis 

A A 

of V dual to 8. Then for ^sVwe have 



= Vi( Jxa) 






(5.1) 



5 | The Dual of a Linear Transformation 143 

The linear functional on U which has the effect [ff(y*)](a,) = a., is a{ip) = 

XLi ? A- If V = L"-x *W«. then <%) = ^U *i(SLi ««A) = 25-x (S-i 
bia ik )<f> k . Thus the representation of <r(y) is 5^. To follow absolutely the 
notational conventions for representing a liner transformation as given in 
Chapter II, (2.2), a should be represented by A T . However, because we 
have chosen to represent y> by the row matrix B, and because a(y) is repre- 
sented by BA, we also use^ to represent a. We say that A represents a 
with respect to B in V and A in U. 

In most texts the convention to represent a by A T is chosen. The reason 
we have chosen to represent a by A in this: in Chapter V we define a closely 
related linear transformation a*, the adjoint of a. The adjoint is not repre- 
sented by A T ; it is represented by A* = A T , the conjugate complex of the 
transpose. If we chose to represent a by A T , we would have a represented 
by A, a by A T in both the real and complex case, and a* represented by A T 
in the real case and A T in the complex case. Thus, the fact that the adjoint 
is represented by A T in the real case does not, in itself, provide a compelling 
reason for representing the dual by A T . There seems to be less confusion if 
both a and a are represented by A, and a* is represented by A* (which 
reduces to A T in the real case). In a number of other respects our choice 
results in simplified notation. 

If I g U, then y(<r(£)) = <%)(£), by definition of 6{xp). If f is represented 
by JT, then v ( ff (D) = *UAT) = (5^ = d(y>)(f). Thus the representation 
convention we are using allows us to interpret taking the dual of a linear 
transformation as equivalent to the associative law. The interpretation could 
be made to look better if we considered ffasa left operator on U and a right 
operator on V. In other words, write cr(f) as ct| and d(y>) as ypa. Then 
V(fff) = (w)£ would correspond to passing to the dual. 

Theorem 5.3. K(a) L = Im(o). 

proof. If> e *(<7) c £ then for all a e U, v>(cr)a)) = a(v)(a) = 0. Thus 
y e Imfa)^ If y, g ImCc)- 1 , then for all aeU, <7(y)(a) = y(<*(a)) = 
Thus y e /sT(o-) and ^(5) = Im(a)- 1 -. D 

Corollary 5.4. A necessary and sufficient condition for the solvability of 
the linear problem <r(£) = p is that ft e ^(ct) 1 . n 

The ideas of this section provide a simple way of proving a very useful 
theorem concerning the solvability of systems of linear equations. The 
theorem we prove, worded in terms of linear functional and duals, may not 
at first appear to have much to do with with linear equations. But, when 
worded in terms of matrices, it is identical to Theorem 7.2 of Chapter II. 



144 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Theorem 5.5. Let a be a linear transformation of U into V and let /? be any 
vector in V. Either there is a £ 6 U such that 

(1) <r(f) = 0, 

or there is a <f> e V such that 

(2) a(<t>) = 0and<f>p = l. 

proof. Condition (1) means that (3 e Im(r) and condition (2) means that 
/S £ K(6) L . Thus the assertion of the Theorem follows directly from Theorem 
5.3. □ 

Theorem 5.5 is also equivalent to Theorem 7.2 of Chapter 2. 
In matrix notation Theorem 5.5 reads: Let A be an m x n matrix and 
B an m x 1 matrix. Either there is an n x 1 matrix X such that 

(1) AX= B, 

or there is a 1 X m matrix C such that 

(2) CA = and CB = 1. 

Theorem 5.6. a and a have the same rank. 

proof. By Theorems 5.3 and 4.1, v{a) = n — p(a) = v(a). o 

Theorem 5.7. Let W be a subspace of V invariant under a. Then W 1 - is a 

subspace of V invariant under a. 

proof. Let </> 6 W L . For any a £ W we have a<f>(on.) = <f>o(aC) = 0, since 
or(a) g W. Thus 6<j> e W ± . a 

Theorem 5.8. The dual of a scalar transformation is also a scalar trans- 
formation generated by the same scalar. 

proof. If ofa) = aa for all a e V, then for each cf> e V, (<7</>)(a) =0<r(a) = 
cfyacx. = acfxx. □ 

Theorem 5.9. If A is an eigenvalue for a, then X is also an eigenvalue for a. 

proof. If X is an eigenvalue for a, then a — X is singular. The dual of 
a — X is a — X and it must also be singular by Theorem 5.6. Thus X is an 
eigenvalue of a. □ 

Theorem 5.10. Let V have a basis consisting of eigenvectors of a. Then 
V has a basis consisting of eigenvectors of a. 

proof. Let {a 1} a 2 , . . . , a n } be a basis of V, and assume that o^ is an 
eigenvector of a with eigenvalue X t . Let {</> l5 <f> 2 , . . . , <j> n ) be the corresponding 
dual basis. For all a,-, afai&j) = </>jff(a,) = ^iX^ctj = X i j> i c(. } = Xjd i:j = X^d^. 
Thus a<j>i = X^i and fa is an eigenvector of a with eigenvalue X t . □ 



6 | Duality of Linear Transformations 145 

EXERCISES 

1. Show that <tt = Tff. 

2. Let crbea linear transformation of R 2 into R s represented by 

R -1" 



A = 



2 -4 

2 2 



Find a basis for (<r(R 2 )H. Find a linear functional that does not annihilate (12 1) 
Show that (1,2, 1) £ <r(R 2 ). 

3. The following system of linear equations has no solution. Find the linear 
functional whose existence is asserted in Theorem 5.5. 

$X-^ -f- Xn ^ 2 

x x + 2x 2 = 1 
—x x + 3* 2 = 1. 

*6 I Duality of Linear Transformations 

In Section 5 we have defined the dual of a linear transformation. What is 
the dual of the dual ? In considering this question we restrict our attention 
to finite dimensional vector spaces. In this case, the mapping / of V into V, 
defined in Section 2, is an isomorphism. Since a, the dual of a, is a mapping 
of V into itself, the isomorphism J allows us to define a corresponding linear 
transformation on V. For convenience, we also denote this linear transforma- 
tion by a. Thus, 

d(cc) = /-*[§(/(«))]. (6.1) 

where the a on the left is the mapping of V into itself defined by the ex- 
pression on the right. 

Theorem 6.1. The relation between a and a is symmetric; that is, a is 
the dual of a. 
proof. By definition, 

<r(/(<x))(<£) = J(a)6(4) = dtfXa.) = <f>a(ai) = /(<r(a))(0). 

Thus §(y(a)) = /(a(a)). By (6.1) this means a(a) = /"^(./(a))] = 
J WffCa))] = <r(a). Hence, <r is the dual of a. D 

The reciprocal nature of duality allows us to establish dual forms of theorems 
without a new proof. For example, the dual form of Theorem 5.3 asserts 
that K(oy = Im(ff). We exploit this principle systematically in this section. 

Theorem 6.2. The dual of a monomorphism is an epimorphism. The 
dual of an epimorphism is a monomorphism. 



146 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

proof. By Theorem 5.3, Im(or) = K(a) L . If a is an epimorphism, 
Im(<r) = V so that K(a) = V 1 = {0}. Dually, lm(a) = K(o) L . If a is a 
monomorphism, K(a) = {0} and Im(a) = U. D 

alternate proof. By Theorem 1.15 and 1.16 of Chapter II, a is an 
epimorphism if and only if to = implies t = 0. Thus or = implies 
f = if and only if a is an epimorphism. Thus a is an epimorphism if and 
only if a is a monomorphism. Dually, r is a monomorphism if and only if 
f is an epimorphism. □ 

Actually, a much more precise form of this theorem can be established. 
If W is a subspace of V, the mapping i of W into V that maps <xeW onto 
a e V is called the injection of W into V. 

Theorem 6.3. Let W be a subspace of V and let i be the injection mapping 
of W into V. Let R be the restriction map of V onto W. Then i and R are dual 
mappings. 

proof. Let (f>EV. For any a e W, i?(<£)(a) = <£i(a) = l(<f>)(a). Thus 
R((f)) = \((/)) for each <j>. Hence, R = 1. D 

Theorem 6.4. If it is a projection of U onto S along T, the dual rr is a 

projection ofU onto T- 1 along S L . 

proof. A projection is characterized by the property tt 2 = tt. By 

Theorem 5.7, -ft 2 = tt* = -ft so that tt is also a projection. By Theorem 5.3, 
K(tt) = lmi-n-y = S ± and Im(tf) = K(tt) l = T- 1 . □ 

A careful comparison of Theorems 6.2 and 6.4 should reveal the perils of 
being careless about the domain and codomain of a linear transformation. 
A projection tt of U onto the proper subspace S is not an epimorphism because 
the codomain of tt is U, not S. Since if is a projection with the same rank as 
77, 7r cannot be a monomorphism, which it would be if 77 were an epimorphism. 

Theorem 6.5. Let be a linear transformation of U into V and let r be a 
linear transformation of V into W. Let a and r be the corresponding dual 
transformations. Iflm(a) = K(t), then Im(f) = K(a). 

proof. Since Im(cr) <= K(t), rcr(a) = for all a e U; that is, to = 0. 
Since or = to = 0, Im(f) c: K{6). Now dim Im(f) = dim Im(r) since r and 
t have the same rank. Thus dim Im(r) = dim V — dim K{r) = dim V — 
dim Im(cr) = dim V - dim lm(6) = dim K{6). Thus K(6) = Im(f). □ 

Definition. Experience has shown that the condition Im((r) = K{r) is very 
useful because it is preserved under a variety of conditions, such as the 
taking of duals in Theorem 6.5. Accordingly, this property is given a special 
name. We say the sequence of mappings 

U-^->V-L>W (6.1) 



7 | Direct Sums 147 

is exact at V if Im(cr) = K{t). A sequence of mappings of any length is said 
to be exact if it is exact at every place where the above condition can apply. 
In these terms, Theorem 6.5 says that if the sequence (6.1) is exact at V, the 
sequence 

A. T A. # A. 

U+~V^-W (6.2) 

is exact at V. We say that (6.1) and (6.2) are dual sequences of mappings. 

Consider the linear transformation a of U into V. Associated with a is the 
following sequence of mappings 

_► K{a) -U U -!► V -%► V/Im(a) — ► 0, (6.3) 

where i is the injection mapping of K(a) into U, and »y is the natural homo- 
morphism of V onto V/Im(cr). The two mappings at the ends are the only 
ones they could be, zero mappings. It is easily seen that this sequence is 
exact. 
Associated with a is the exact sequence 

<— /Im(o-) <— -^- V +- K{a) -<— 0. (6.4) 

By Theorem 6.3 the restriction map R is the dual of i, and by Theorem 4.5 
R an r\ differ by a natural isomorphism. With the understanding that 

U/Im(a) is isomoprhic to K(a), and V/Im(<7) is isomorphic to K(a), the 
sequences (6.3) and (6.4) are dual to each other. 

*7 I Direct Sums 

Definition. If A and B are any two sets, the set of pairs, (a, b), where a e A 
and b e 6, is called the product set of A and 6, and is denoted by A x 6. If 
i A i \i = 1 > 2, . . . , r]} is a finite indexed collection of sets, the product set of 
the {A J is the set of all n-tuples, (a lt a 2 , . . . , a n ), where a t e A f . This product 
set is denoted by Xf =1 A t . If the index set is not ordered, the description of 
the product set is a little more complicated. To see the appropriate generali- 
zation, notice that an «-tuple in X» =1 A t , in effect, selects one element from 
each of the A { . Generally, if {A tt \ f ie M} is an indexed collection of sets, an 
element of the product set X„ eM A^ selects for each index fi an element of 
A^. Thus, an element of X^ A^ is a function defined on M which associates 
with each /<eA1an element a^ e A^. 

Let {V { \i = 1, 2, ...,«} be a collection of vector spaces, all defined over 
the same field of scalars F. With appropriate definitions of addition and 



148 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

scalar multiplication it is possible to make a vector space over F out of the 
product set X? =1 V^. We define addition and scalar multiplication as follows : 

(a x , . . . , a n ) + (ft, . . . , ft) = (a x + ft, . . . , a M + ft) (7.1) 

a(a l5 . . . , a„) = (aa l5 . . . , aa„). (7.2) 

It is not difficult to show that the axioms of a vector space over F are satisfied, 
and we leave this to the reader. 

Definition. The vector space constructed from the product set Xj^y, by the 
definitions given above is called the external direct sum of the V t and is 
denoted by V 1 ® V 2 • • • ® V n = ©? =1 V*. 

If D = ©f =1 Vj is the external direct sum of the V if the V t are not subspaces 
of D (for n > 1). The elements of D are n-tuples of vectors while the elements 
of any V t are vectors. For the direct sum defined in Chapter I, Section 4, the 
summand spaces were subspaces of the direct sum. If it is necessary to 
distinguish between these two direct sums, the direct sum defined in Chapter I 
will be called the internal direct sum. 

Even though the V { are not subspaces of D it is possible to map the V { 
monomorphically into D in such a way that D is an internal direct sum of these 
images. Associate with a fc e V k the element (0, . . . , 0, a*, 0, . . . , 0) e D, 
in which a fc appears in the kth position. Let us denote this mapping by i k . 
i k is a monomorphism of V k into D, and it is called an injection. It provides 
an embedding of V k in D. If V' k = lm(i k ) it is easily seen that D is an internal 
direct sum of the V k . 

It should be emphasized that the embedding of V k in D provided by the 
injection map i k is entirely arbitrary even though it looks quite natural. There 
are actually infinitely many ways to embed V k in D. For example, let a be any 
linear transformation of V k into V x (we assume k ^ 1). Then define a new 
mapping i' k of V k into D in which a fc e V k is mapped onto (<r(a fc ), 0, . . . , 0, 
a w , 0, . . . , 0) eD. It is easily seen that i k is also a monomorphism of V k 
into D. 

Theorem 7.1. If dim U = m and dim V = n, then dim U ® V = m + n. 

proof. Let A = {a l5 . . . , a OT } be a basis of U and let B = {ft, . . . , ft} 
be a basis of V. Then consider the set {(a l5 0), . . . , (a m , 0), (0, ft), 
(0, ft)} = (A, 8) in U ® V. If a = ^Zi *<«< and £ = 2?=i My. then 

m n 

(a, ft = 1^,0)+ 2 b,(0, ft) 
and hence (A, B) spans U ® V. If we have a relation of the form 

to n 

2a i (a i ,0)+2^(0,ft) = 0, 
i=i j=i 



7 | Direct Sums 149 

then 

1=1 3=1 / 

and hence 2£i *<<** = ° and 2?=i V> = °- since A and 6 are linearl y in- 
dependent, all at = and all 6, : = 0. Thus (A, B) is a basis of U V and 
1/ © V is of dimension m + n. D 

It is easily seen that the external direct sum ©* =1 V it where dim V, = /w f -, is 
of dimension ]£> =1 m^ 

We have already noted that we can consider the field F to be a 1 -dimen- 
sional vector space over itself. With this starting point we can construct 
the external direct sum F ® F, which is easily seen to be equivalent to the 
2-dimensional coordinate space F 2 . Similarly, we can extend the external 
direct sum to include more summands, and consider F n to be equivalent to 
F ® • • • ® F, where this direct sum includes n summands. 

We can define a mapping TT k of D onto V k by the rule ir k (oL lt . . . , <x n ) = a fc . 
7T k is called a projection of D onto the kth component. Actually, rr k is not a 
projection in the sense of the definition given in Section II- 1, because here 
the domain and codomain of -n k are different and 7r fc 2 is not defined. However, 
(i k 7T k y = t k 7T k i k TT k = i k \7T k = i k n k so that t fc 7T fc is a. projection. Let W k denote 
kernel of 7T k . It is easily seen that 

W k = V 1 © • • • © Vj, © {0} © V k+l © • • • © V n . (7.3) 

The injections and projections defined are related in simple but important 
ways. It is readily established that 

"A = U k , 7 - 4 ) 

n.^ = for ijLk, (7.5) 

tjTT! + • • ' + L n TT n = 1 D . (7.6) 

The mappings i k 7r t for i 9^ k are not defined since the domain of i k does not 
include the codomain of 77^. 

Conversely, the relation (7.4), (7.5), and (7.6), are sufficient to define the 
direct sum. Starting with the V k , the monomorphisms i k embed the V k in D. 
Let V k = Im(« fc ). Let D' = V[ + • • • + V;. Conditions (7.4) and (7.5) imply 
that D' is a direct sum of the V k . For if = a( + • • • + <, with c/.' k e V^, 
there exist a fc g V fc such that i k (<x k ) = cn k . Then 7r fc (0) = 7r fc (a() + • • • + 
^*«) = 7Vi(*i) + ' ' • + T^CaJ = a fc = 0. Thus < = and the sum is 
direct. Condition (7.6) implies that D' = D. 

A A. 

Theorem 7.2. The dual space of U © V « naturally isomorphic to U © V. 

proof. First of all, if dim U = m and dim V = n, then dim (7 © V = 

m + n and dim U ®V = m + n. Since U © V and U © V have the same 



150 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

dimension, there exists an isomorphism between them. The real content 
of this theorem, however, is that this isomorphism can be specified in a 
natural way independent of any coordinate system. 

For ((f>, y>) e © V and (a, P)eU®V, define 

(cf>, vO(a, 0) = <£a + y>0. (7.7) 

It is easy to verify that this mapping of (a, /?) s U ® V onto </>a + ip{3 e F is 
linear and, therefore, corresponds to a linear functional, an element of 

U ® V. It is also easy to verify that the mapping of U ® V into U ® V that this 
defines is a linear mapping. Finally, if (<j>, ip) corresponds to the zero linear 
functional, then (c/>, tp)(<x, 0) = (fxx. = for all oleU. This implies that <f> = 0. 
In a similar way we can conclude that y> = 0. This shows that the mapping of 

A A ^^^\ 

U © V into l) © Vhas kernel {(0, 0)}. Thus the mapping is an isomorphism. □ 

Corollary 7.3. The dual space to V x ® • • • © V n is naturally isomorphic 

to V x © • • • © V n . □ 

The direct sum of an infinite number of spaces is somewhat more com- 
plicated. In this case an element of the product set P = X MeM V is a function 
on the index set M. For a e X^V^, let a,, = <x(/u) denote the value of this 
function in V„. Then we can define a + (3 and a<x (for a e F) by the rules 

(a + P)(p) = a„ + ^, (7.8) 

(fla)Cu) = aa^. (7.9) 

It is easily seen that these definitions convert the product set into a vector 
space. As before, we can define injective mappings i^ of V M into P. However, 
P is not the direct sum of these image spaces because, in algebra, we permit 
sums of only finitely many summands. 

Let D be the subset of P consisting of those functions that vanish on all but a 
finite number of elements of M. With the operations of vector addition and 
scalar multiplication defined in P, D is a subspace. Both D and P are useful 
concepts. To distinguish them we call D the external direct sum and P the 
direct product. These terms are not universal and the reader of any mathe- 
matical literature should be careful about the intended meaning of these or 
related terms. To indicate the summands in P and D, we will denote P by 
X^V, and D by 0^ V„. 

In a certain sense, the external direct sum and the direct product are dual 
concepts. Let i denote the injection of V into P and let tt denote the pro- 
jection of P onto V . It is easily seen that we have 

and 



7 | Direct Sums 151 

These mappings also have meaning in reference to D. Though we use the 
same notation, -n^ requires a restriction of the domain and i^ requires a 
restriction of the codomain. For D the analog of (7.6) is correct, 

Even though the left side of (7.6)' involves an infinite number of terms, 
when applied to an element a g D, 

( 2 ^)(«) = 2 («„*■„)(«) = 2 W = a ( 7 - 10 ) 

jieAl /ie/V1 Me/V1 

involves only a finite number of terms. An analog of (7.6) for the direct 
product is not available. 

Consider the diagram of mappings 

V^D^V V , (7.11) 

and consider the dual diagram 

$ u JldJ1-V v . (7.12) 

For v 5^ fi, TTyip = 0. Thus t M fi, = ir^ = 0. For v = p, t^ = tt^ = 
1 = 1. By Theorem 6.2, t^ is an epimorphism and tt^ is a monomor- 
phism. Thus -fr^ is an injection of V^ into D, and t^ is a projection of D 
onto V„. 

Theorem 7.4. If D is the external direct sum of the indexed collection 
{V fi \ /j, e M}, D is isomorphic to the direct product of the indexed collection 

proof. Let (f>G D. For each fie M, fa^ is a linear functional defined on 
V^; that is, ^ corresponds to an element in V M . In this way we define a 
function on M which has at/teM the value </>t„ e 9^. By definition, this is 
an element in X MsM V M . It is easy to check that this mapping of D into the 
direct product X^ gA1 V^ is linear. 

If (f> ?£ 0, there is an a e D such that <f><x. ^ 0. Since <f><x = (f> [Q^e/w V^*)] — 

2i"6/vi <£ V7*( a ) ^ ^» tnere * s a A* G M such that fapTT^cn.) ^ 0. Since 

7^ (a) g V^, <£^ # 0. Thus, the kernel of the mapping of D into X^ eM V,, is 

zero. 

a. 

Finally, we show that this mapping is an epimorphism. Let y> e X^ eA1 V^. 
Let ^ = y(/*) g V^ be the value of y> at /*. For a g D, define 0a = 
2 i «6AiV , / < ( 7r A* a )- This sum is defined since tt^ol = for all but finitely many ft. 



152 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

For a„ e V v , 

<KOv) = $> v a v ) 

= V»(«v)- (7.13) 

This shows that y> is the image of <f>. Hence, D and X fieM V^ are ismorphic. n 

While Theorem 7.4 shows that the direct product 6 is the dual of the exter- 
nal direct sum D, the external direct sum is generally not the dual of the direct 
product. This conclusion follows from a fact (not proven in this book) that 
infinite dimensional vector spaces are not reflexive. However, there is more 
symmetry in this relationship than this negative assertion seems to indicate. 
This is brought out in the next two theorems. 

Theorem 7.5. Let {V„ | fi e M} be an indexed collection of vector spaces 
over F and let {a^ \ /u, e M} be an indexed collection of linear transformations, 
where <r„ has domain V^ and codbmain Ufor all pi. Then there is a unique linear 
transformation a o/©^^ into U such that a^ = ai^ for each ju. 

proof. Define 

a = 2 <W (7.14) 

HeM 

For each a e ® /ieM v », *(<*) = 2^m V7«( a ) is well defined since only a finite 
number of terms on the right are non-zero. Then, for a„ e V v , 

= 2 o^/vXav) 

fieM 

= <W (7.15) 

Thus ai v = a v . 

If a' is another linear transformation of ® MeM V„ into U such that <r„ = a'l^, 
then 



a' = a'\ 



D 



lieM 

= 2 a'l 77-,, 



= 2 <W 

/16M 

= O". 

Thus, the a with the desired property is unique. □ 



7 | Direct Sums 153 

Theorem 7.6. Let {V„ | y. e M} fee an indexed collection of vector spaces 
over F and let {r^ | fx e M} fee an i/Ktaw* collection of linear transformations 
where t„ /ias ^omaw W and codomain V„ /or a// /*. 77k?h fAere is a linear 
transformation rofW into X^ V„ swcA */zcrt t^ = ir^for each [x. 

proof. Let a g W be given. Since r(a) is supposed to be in X^V^, 

r(a) is a function on M which for ^ g M has a value in V„. Define 

T(a)0«) = t». (7-16) 

Then 

^r(a) = T(a)(^) = t„(«), (7.17) 

SO that 77^ T = T^. D 

The distinction between the external direct sum and the direct product is 
that the external direct sum is too small to replace the direct product in 
Theorem 7.6. This replacement could be done only if the indexed collection 
of linear transformations were restricted so that for each a e W only finitely 
many mappings have non-zero values ^(a). 

The properties of the external direct sum and the direct product established 
in Theorems 7.5 and 7.6 are known as "universal factoring" properties. In 
Theorem 7.5 we have shown that any collection of mappings of V„ into a space 
U can be factored through D. In Theorem 7.6 we have shown that any 
collection of mappings of W into the V„ can be factored through P. Theorems 
7.7 and 7.8 show that D and P are the smallest spaces with these properties. 

Theorem 7.7. Let W be a vector space over F with an indexed collection 
of linear transformations {X M \ /x e M} where each X^ has domain V^ and co- 
domain W. Suppose that, for any indexed collection of linear transformations 
{a | fx g M} with domain V^ and codomain U, there exists a linear transforma- 
tion XofW into U such that o^ = XX^. Then there exists a monomorphism of 
D into W. 

proof. By assumption, there exists a linear transformation X of W into 
D such that ^ = XX,,. By Theorem 7.5 there is a unique linear transformation 
a of D into W such that X^ = oi„. Then 

= ^ XX^TT^ 
= Xo 2 l^TTp 

= Xo. (7.18) 

This means that a is a monomorphism and X is an epimorphism. □ 



154 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Theorem 7.8. Let Y be a vector space over F with an indexed collection 
of linear transformations {Q fl \[ie M} where each 6^ has domain Y and codomain 
V M . Suppose that, for any indexed collection of linear transformations 
{ T ix I f* e M} with domain W and codomain V^, there exists a linear transfor- 
mation OofW into Y such that r^ = 6^6. Then P is isomorphic to a subspace 
o/Y. 

proof. With P in place of W and ^ in place of t„, the assumptions 
of the theorem say there is a linear transformation of P into Y such that 
^n = OpO for each fi. By Theorem 7.6 there is a linear transformation r of V 
into P such that 9^ = tt^t for each [x. Then 

^ = OpO = wyrfl. 

Recall that a e P is a function defined on M that has at ju e M a value a 
in V M . Thus a is uniquely defined by its values. For /u e M 

ff„(T0(a)) = ^(a) = a„. 

Thus r0(a) = a and t6 = 1 P . This means that 6 is a monomorphism and r 
is an epimorphism and P is isomorphic to lm(0). □ 

Theorem 7.9. Suppose a space D' is given with an indexed collection of 
monomorphisms {^ | /u, e M} of V^ into D' and an indexed collection of epi- 
morphisms {tt^\ jug M} ofD' onto V M such that 



*'/». = 1/ m 



77 



' v ip = for v ?£ /u, 
Z, l li 7T fi = Id'- 

fieM 

Then D and D' are isomorphic. 

This theorem says, in effect, that conditions (7.4), (7.5), and (7.6)' 
characterize the external direct sum. 

proof. For a e D' let a„ = w^(a). We wish to show first that for a given 
a e D' only finitely many a„ are non-zero. By (7.6)' a = l D -(a) = 
X«e/M *^(«) = X«eM ^<V Thus, only finitely many of the t^ are non-zero. 
Since ^ is a monomorphism, only finitely many of the a„ are non-zero. 

Now suppose that {a^ | ju e M} is an indexed collection of linear transforma- 
tions with domain V„ and codomain U. Define X = ^ eM o^. For a e D', 
^( a ) = X«e/n ce fi 7T ' f i(< x ) = Xue/vi a ^n * s defined in U since only finitely many a^ 
are non-zero. Also, Xi'^ = Q> eM <VvK = V TmiS D ' satisfies the condi- 
tions of W in Theorem 7.7. 

Repeating the steps of the proof of Theorem 7.7, we have a monomorphism 
a of D into D' and an epimorphism X of D' onto D such that 1 D = Xa. But 



7 | Direct Sums 
we also have 



155 



Id' = 2 l 'n n ii 

tieM 
lieM 

= <tA. 
Since cr is both a monomorphism and an epimorphism, D and D' are iso- 
morphic. □ 

The direct product cannot be characterized quite so neatly. Although 
the direct product has a collection of mappings satisfying (7.4) and (7.5), 
(7.6)' is not satisfied for this collection if M is an infinite set. The universal 
factoring property established for direct products in Theorem 7.6 is inde- 
pendent of (7.4) and (7.5), since direct sums satisfy (7.4) and (7.5) but not 
the universal factoring property of Theorem 7.6. We can combine these three 
conditions and state the following theorem. 

Theorem 7.10. Let P' be a vector space over F with an indexed collection of 
monomorphisms {i^\/*e M} of V^ into P' and an indexed collection of epi- 
morphisms « | jm e M} ofP' onto V M such that 

77-;^ = for r^jw 

and such that if{ Pli \ fx e M} is any indexed collection of linear transformations 
with domain W and codomain V„, there is a linear transformation pofW into 
P' such that Pfl = ir'rffor each fi. IfP' is minimal with respect to these three 
properties, then P and ?' are isomorphic. 

When we say that P' is minimal with respect to these three properties we 
mean: Let P" be a subspace of P' and let wj be the restriction of < to P". 
If there exists an indexed collection of monomorphisms {t£ | n e M} with 
domain V„ and codomain P" such that (7.4), (7.5) and the universal factoring 
properties are satisfied with tj in place of ^ and tjJ in place of *£, then P" = ?'. 

proof. By Theorem 7.8, P is isomorphic to a subspace of ?'. Let be the 
isomorphism and let P" = lm(0). With appropriate changes in notation 
(P' in place of Y and tt-; in place of 6 J, the proof of Theorem 7.8 yields 
the relations 



^u = 7r u T » 



156 . Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

where t is an epimorphism of ?' onto P. Thus, if ttJ is the restriction of n'^ 
to P", we have 



This shows that 77^' is an epimorphism. 
Now let £ = 0^. 

11.1t irt\. a 

and 

ff vC = t^V = ""V/* = for v ^ [jl. 

Since P has the universal factoring property, let r be a linear transformation 
of W into P such that />,, = ir^r for each ^. Then 

P, = *V = <^r = Try 

for each ^, where r" = 0r. This shows that P" has universal factoring 
property of Theorem 7.6. Since we have assumed ?' is minimal, we have 
P" = P' so that P and P' are isomorphic. D 

8 I Bilinear Forms 

Definition. Let U and V be two vector spaces with the same field of scalars 
F. Let/ be a mapping of pairs of vectors, one from U and one from V, into 
the field of scalars such that /(a, p), where <x.eil and p g V, is a linear 
function of a and ^ separately. Thus, 

/(a iai + * 2 a 2 , ^ + 6 2 &) = ai /( ai , Ajft + 6 2 &) + a 2 /(a 2 , ^ + 6 2 /? 2 ) 

= aA/(a l5 &) + aA/(a x , p 2 ) 

+ «2^ 1 /(a 2 , A) + a 2 b 2 f(a 2 , p 2 ). (8.1) 

Such a mapping is called a bilinear form. In most cases we shall have U = V. 

(1) Take U = V = R n and F = R. Let A = {a l5 . . . , aj be a basis in 
R n . For I = JjLx *i«< and /? = 2» =1 */*** we may defme/(|, rj) = 2» =1 *#,. 
This is a bilinear form and it is known as the inner, or dot, product. 

(2) We can take F = R and U = V = space of continuous real-valued 
functions on the interval [0, 1]. We may then define/(a, P) = JJ *(x)P(x) dx. 
This is an infinite dimensional form of an inner product. It is a bilinear form. 

As usual, we proceed to define the matrices representing bilinear forms 
with respect to bases in U and V and to see how these matrices are transformed 
when the bases are changed. 

Let A = {on, . . . , a w } be a basis in U and let 8 = {&, . . . , p n } be a basis 
in V. Then, for any <x e U, e V, we have a = J~ 1 x^ and p = 2? =1 y,P f 



8 | Bilinear Forms 
where x if y i e F. Then 



157 



m / n \ 



= I2¥^.^ (8-2) 

1=1 3=1 

Thus we see that the value of the bilinear form is known and determined 
for any a £ U, ft g V, as soon as we specify the mn values /(oc i5 /?,-). Con- 
versely, values can be assigned to/(oc l5 ft } ) in an arbitrary way and /(a, /?) 
can be defined uniquely for all a e U, {3 eV, because A and B are bases in 
U and V, respectively. 

We denote/(a i5 ft ) by & i3 and define B = [b^] to be the matrix represent- 
ing the bilinear form with respect to the bases A and 8. We can use the 
ra-tuple X = {x y , . . . , x m ) to represent a and the n-tuple Y = (y u . . . , y n ) 
to represent ft. Then 



/(a, ft) = ^ 2 *A;2/; 
i=i j=i 



= [*!••• * J5 



= * t by\ 



yi 



LyJ 



(8.3) 

(Remember, our convention is to use an m-tuple X = (x u . . . , x m ) to 
represent an m x 1 matrix. Thus X and Y are one-column matrices.) 

Suppose, now, that A' — {<*.[, . . . , cn' m }is a new basis of U with matrix 
of transition P, and that B' = {ft[, . . . , ft^} is a new basis of V with matrix 
of transition Q. The matrix B' = [b' i} ] representing / with respect to these 
new bases is determined as follows : 



«m n \ 

Z Pri*r 2 «.y&) 
r=l s=l / 



= J,Pr 



r=l 



2 Qsif(*r. Ps) 



= 22 PriKsQsi- 
r=l s=l 



(8.4) 



158 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Thus, 

B' = P T BQ. (8.5) 

From now on we assume that U = V. Then when we change from one 
basis to another, there is but one matrix of transition and P = Q in the 
discussion above. Hence a change of basis leads to a new representation 
of/ in the form 

B' = P T BP. (8.6) 

Definition. The matrices B and P T BP, where P is non-singular, are said to 
be congruent. 

Congruence is another equivalence relation among matrices. Notice 
that the particular kind of equivalence relation that is appropriate and 
meaningful depends on the underlying concept which the matrices are 
used to represent. Still other equivalence relations appear later. This 
occurs, for example, when we place restrictions on the types of bases we 
allow. 

Definition. If /(a, 0) = /(/?, a) for all a, /? e V, we say that the bilinear form 
/is symmetric. Notice that for this definition to have meaning it is necessary 
that the bilinear form be defined on pairs of vectors from the same vector 
space, not from different vector spaces. If/ (a, a) = for all a e V, we say 
that the bilinear form /is skew-symmetric. 

Theorem 8.1. A bilinear form f is symmetric if and only if any matrix B 
representing f has the property B T = B. 

proof. The matrix B = [b i} ] is determined by /(a i5 a,). But b H = 
f(cL it oO = /(a„ a,) = b i} so that B T = B. 

If B T = B, we say the matrix B is symmetric. We shall soon see that 
symmetric bilinear forms and symmetric matrices are particularly important. 

If B T = B, then /(a„ a y ) = b ti = b H =/(a„ a,). Thus /(a, p) = 

/(I?=i «,«!, Iu b >*>) = 2Li 2"-i W(«<» «*) = 2?=i 2"-i W(«*. a *) = 

/(/?, a). It then follows that any other matrix representing/will be symmetric ; 
that is, if B is symmetric, then P T BP is also symmetric. □ 

Theorem 8.2. If a bilinear form f is skew-symmetric, then any matrix B 
representing f has the property B T = —B. 

proof. For any a, e V, = /(a + & a + 0) = /(«., a) + /(a, 0) + 
/Off, a) +/(/8, £) =/(a, /8) +/(/9, a). From this it follows that /(a, 0) = 
-/(£, a) and hence B T = —B.U 

Theorem 8.3. If 1 + 1 5* ana? f/ie matrix B representing f has the 
property B T = — B, then f is skew-symmetric. 



8 | Bilinear Forms 1*9 

proof. Suppose that B T = -B, or /(a, P) = -f(B, a) for all a, e V. 
Then /(a, a) = -/(a, a), from which we have /(a, a) +/(<x, a) = 
(1 + l)/(a, a) = 0. Thus, if 1 + 1 ^ 0, we can conclude that /(a, a) = 
so that /is skew-symmetric. □ 

If B T = —B, we say the matrix B is skew-symmetric. The importance 
of symmetric and skew-symmetric bilinear forms is implicit in 

Theorem 8.4. If 1 + 1^0, every bilinear form can be represented 
uniquely as a sum of a symmetric bilinear form and a skew-symmetric bilinear 
form. 

proof. Let /be the given bilinear form. Define / s ( a, P) = |[/(a, P) + 
/{0,a)]and/„Ox,0) = il/(a, j8) -/(jff,a)]. (The assumption that 1 + 1 * 
is required to assure that the coefficient "£" has meaning.) It is clear that 
/ s (a, fl =/,(j0, a) and/ ss (a, a) = so that/ is symmetric and/„ is skew- 
symmetric. 

We must yet show that this representation is unique. Thus, suppose that 
/( a , p) =/ x (a, P) +/ 2 (a, P) where /. is symmetric and/ 2 is skew-symmetric. 
Then /(a, P)+f{$, a) =/ 1 (a, 0) . + /,(«, P)+fi(fi, «)+/■(& «) = 
2/x(a, 0). Hence /i(a, 0) = £|/(a, 0) + /(& a)]. If follows immediately 
that/ 2 (a, 0) = £[/(a, 0) -/(£, a)]. D 

We shall, for the rest of this book, assume that 1 + 1^0 even where 
such an assumption is not explicitly mentioned. 

EXERCISES 

1. Let a = (» x , x 2 ) g R 2 and let j8 = (y lt y 2 , y 3 ) e R 3 . Then consider the bilinear 

form 

/(a, 0) = a;^ + 2x^2 - a^! - x 2 y 2 + 6x x t/ 3 . 

Determine the 2 x 3 matrix representing this bilinear form. 

2. Express the matrix 

1 2 3" 

4 5 6 

7 8 9 

as the sum of a symmetric matrix and a skew-symmetric matrix. 

3. Show that if B is symmetric, then P T BP is symmetric for each P, singular or 
non-singular. Show that if B is skew-symmetric, then P^BP is skew-symmetric 
for each P. 

4. Show that if A is any m x n matrix, then A T A and AA T are symmetric. 

5. Show that a skew-symmetric matrix of odd order must be singular. 



160 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

6. Let / be a bilinear form defined on U and V. Show that, for each a £ I/, 
/(a, /?) defines a linear functional <f> a on V; that is, 

*«(#=/(«, 0). 
With this fixed / show that the mapping of a £ U onto <f> a £ V is a linear trans- 
formation of U into V. 

7. (Continuation) Let the linear transformation of U into V defined in Exercise 
6 be denoted by a f . Show that there is an a £ U, a ^ 0, such that /(a, /J) = for 
all jS if and only if the nullity of a f is positive. 

8. (Continuation) Show that for each /? £ V, /Y<x, p) defines a linear function \p* 
on U. The mapping of p £ V onto y^ e L/ is a linear transformation ^ of V into L/. 

9. (Continuation) Show that a f and r f have the same rank. 

10. (Continuation) Show that, if U and V are of different dimensions, there 
must be either an a £ U, a ^ 0, such that /(a, /S) = for all p £ V or a /? e V, 
/? t^ 0, such that /(a, /5) = for all a £ U. Show that the same conclusion follows 
if the matrix representing /is square but singular. 

11. Let U be the set of all a £ U such that /(a, 0) = for all peV. Similarly, 
let V be the set of all £ V such that /(a, 0) = for all a £ U. Show that U is a 
subspace of U and that V is a subspace of V. 

12. (Continuation) Show that m — dim U = n — dim V . 

13. Show that if / is a skew-symmetric bilinear form, then /(a, /?) = — /(/?, a) 
for all a, peV. 

14. Show by an example that, if A and B are symmetric, it is not necessarily true 
that AB is symmetric. What can be concluded if A and B are symmetric and 
AB = BAt 

15. Under what conditions on B does it follow that X T BX = for all XI 

16. Show the following: If A is skew-symmetric, then A 2 is symmetric. If A is 
skew-symmetric and B is symmetric, then AB — BA is symmetric. If A is skew- 
symmetric and B is symmetric, then AB is skew-symmetric if and only if AB = BA. 

9 I Quadratic Forms 

Definition. A quadratic form is a function q on a vector space defined by 
setting #(<x) =/(a, a), where/ is a bilinear form on that vector space. 

If/ is represented as a sum of a symmetric and a skew-symmetric bilinear 
form, /(a, 0) =/ s (a, 0) +/ ss (a, 0) where/, is symmetric and /„ is skew- 
symmetric, then^r(a) =/ s (a, a) +/ ss (a, a) =/ s (a, a). Thus # is completely 
determined by the symmetric part of/ alone. -In addition, two different 
bilinear forms with the same symmetric part must generate the same quadratic 
form. 

We see, therefore, that if a quadratic form is given we should not expect 



9 | Quadratic Forms I" 1 

to be able to specify the bilinear form from which it is obtained. At best 
we can expect to specify the symmetric part of the underlying bilinear form. 
This symmetric part is itself a bilinear form from which q can be obtained. 
Each other possible underlying bilinear form will differ from this symmetric 
bilinear form by a skew-symmetric term. 

What is the symmetric part of the underlying bilinear from expressed 
in terms of the given quadratic form ? We can obtain a hint of what it should 
be by regarding the simple quadratic function x 2 as obtained from the bilinear 
function xy. Now (x + yf = x 2 + xy + yx + y 2 . Thus if xy = yx (sym- 
metry), we can express xy as a sum of squares, xy = \[(x + y*) — x 2 — y 2 ]. 
In general, we see that the symmetric part of the underlying bilinear form 
can be recovered from the quadratic form by means of the formula 

= *[/"(<* + ft a + 0) -/(a, <x)-/(ft ft] 

= *[/"(«, «) +/(«, +/(/?, a) + /(ft -/(«, «) -/(ft ft] 
= il/(a,j8)+/(ft<x)] 

= />,ft- ( 9 - 1} 

/, is the symmetric part of/. Thus it is readily seen that 

Theorem 9.1. Every symmetric bilinear form f determines a unique 
quadratic form by the rule q(a) =/(«, a), and if 1 + 1 ^ 0, every quadratic 
form determines a unique symmetric bilinear form / s (a, ft = \[q{* + ft — 
^( a ) — q(fi)]from which it is in turn determined by the given rule. There is a 
one-to-one correspondence between symmetric bilinear forms and quadratic 
forms. □ 

The significance of Theorem 9.1 is that, to treat quadratic forms ade- 
quately, it is sufficient to consider symmetric bilinear forms. It is fortunate 
that symmetric bilinear forms and symmetric matrices are very easy to 
handle. Among many possible bilinear forms corresponding to a given 
quadratic form a symmetric bilinear form can always be selected. Hence, 
among many possible matrices that could be chosen to represent a given 
quadratic form, a symmetric matrix can always be selected. 

The unique symmetric bilinear form/ s obtainable from a given quadratic 
form q is called the polar form of q. 

It is desirable at this point to give a geometric interpretation of quadratic 
forms and their corresponding polar forms. This application of quadratic 
forms is by no means the most important, but it the source of much of the 
terminology. In a Euclidean plane with Cartesian coordinate system, let 
(x) = (x lt x t ) be the coordinates of a general point. Then 

q(( x )) = x \ — ^ x i x * + 2x z 2 



162 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

is a quadratic function of the coordinates and it is a particular quadratic 
form. The set of all points (x) for which q((x)) = 1 is a conic section (in 
this case a hyperbola). 

Now, let (y) = (y x , y 2 ) be the coordinates of another point. Then 

/«((*), 0)) = X \V\ - 2x lV2 - 2x 2Vl + 2 *22/2 

is a function of both (x) and (y) and it is linear in the coordinates of each 
point separately. It is a bilinear form, the polar form of q. For a fixed 
(x), the set of all (y) for which f s ((x), (y)) = 1 is a straight line. This straight 
line is called the polar of (x) and (x) is called the pole of the straight line. 

The relations between poles and polars are quite interesting and are ex- 
plored in great depth in projective geometry. One of the simplest relations 
is that if (x) is on the conic section defined by q((x)) = 1 , then the polar of 
(x) is tangent to the conic at (x). This is often shown in courses in analytic 
geometry and it is an elementary exercise in calculus. 

We see that the matrix representing/ s ((a;), (y)), and therefore also q((x)), is 

1 -2" 



EXERCISES 

1. Find the symmetric matrix representing each of the following quadratic 
forms : 

(a) 2x 2 + 3xy + 6y 2 

(b) %xy + 4y 2 

(c) x 2 + 2xy + Axz + 2>y 2 + yz + lz 2 

(d) Axy 

(e) x 2 + Axy + Ay 2 + 2xz + z 2 + Ayz 
(/) x 2 + Axy — 1y 2 

(g) x 2 + 6xy — 2y 2 — 2yz + z 2 . 

2. Write down the polar form for each of the quadratic forms of Exercise 1 . 

3. Show that the polar form/ s of the quadratic form q can be recovered from the 
quadratic form by the formula 

/,(«, P) = i{?(« + P) ~ ?(« - P)}- 

10 I The Normal Form 

Since the symmetry of the polar form f s is independent of any coordinate 
system, the matrix representing f s with respect to any coordinate system 
will be symmetric. The simplest of all symmetric matrices are those for 
which the elements not on the main diagonal are all zeros, the diagonal 
matrices. A great deal of the usefulness and importance of symmetric 



10 I The Normal Form 



163 



bilinear forms lies in the fact that for each symmetric bilinear form, over 
a field in which 1 + 1^0, there exists a coordinate system in which the 
matrix representing the symmetric bilinear form is a diagonal matrix. Neither 
the coordinate system nor the diagonal matrix is unique. 

Theorem 10.1. For a given symmetric matrix B over a field F (in which 
1 + 1^0), there is a non-singular matrix P such that P T BP is a diagonal 
matrix. In other words, if f is the underlying symmetric bilinear (polar) 
form, there is a basis A' = «, ...,<} of V such thatf s (^, a)). = whenever 

i 5*/ 

proof. The proof is by induction on n, the order of B. If n = 1, the 
theorem is obviously true (every 1 x 1 matrix is diagonal). Suppose the 
assertion of the theorem has already been established for a symmetric 
bilinear form in a space of dimension n — 1. If B = 0, then it is already 
diagonal. Thus we may as well assume that B ^ 0. Let f s and q be the 
corresponding symmetric bilinear and quadratic forms. We have already 
shown that 

/.(«, P) = ifo(* + P) ~ ?(<*) - ?(#]• (1(U) 

The significance of this equation at this point is that if q(<x.) = for all a, ^ 

then/ s (a, 0) = for all a and 0. Hence, there is anajeV such that 4(04) = ^J^ 
d 1 ^0. ' . /?■".' 

With this 04 held fixed, the bilinear form/,(a;, a) defines^ linear functional, f s 
<f>[ on V. This linear functional is not zero since <£X = d x 5* 0. Thus the 
subspace W x annihilated by this linear functional is of dimension n - 1. 

Consider f restricted to VVJ This is a symmetric bilinear form on W x j { 
and, by assumption, there is a basis {a;, . . . , <} of W x such that/ s (a^, aj) = 
if i *j and 2 < 1, ; < n. However, / s «, <) = / s K, <) = because of s j, 
symmetry and the fact that a ; e W 1 for / > 2. Thus/ S «, aj) = if/ 5* ; for -f 
1 < *, ;<«•□ 

Let P be the matrix of transition from the original basis A = {a 1? . . . , a„} 
to the new basis A' = {aj, . . . , <}. Then P T 5i> = B' is of the form 



{% Oi 



B' = 



Vi 
rf 2 











In this display of B' the first r elements of the main diagonal are non-zero 



164 



Linear Functional, Bilinear Forms, Quadratic Forms | IV 



and all other elements of B' are zero, r is the rank of B' and B, and it is 
also called the rank of the corresponding bilinear or quadratic form. 

The di$ along the main diagonal are not uniquely determined. We can 
introduce a third basis A" = {cd' x , . . . , ct." n } such that a" = a^a! where x t ^ 0. 
Then the matrix of transition Q from the basis A' to the basis A" is a diagonal 
matrix with x lt . . . , x n down the main diagonal. The matrix B" representing 
the symmetric bilinear form with respect to the basis A" is 



B" = Q T B'Q = 



d x x x 








W..X*. 















. 



Thus the elements in the main diagonal may be multiplied by arbitrary 
non-zero squares from F. 



"2 1" 
1 2 



we 



get B" = P T B'P = 



By taking B' = and P = 

■3 01 Lo -ij 

. Thus, it is possible to change the elements in the main diagonal 

by factors which are not squares. However, \B"\ = \B'\ • \P\ 2 so that it 
is not possible to change just one element of the main diagonal by a non- 
square factor. The question of just what changes in the quadratic form can 
be effected by P with rational elements is a question which opens the door 
to the arithmetic theory of quadratic forms, a branch of number theory. 

Little more can be said without knowledge of which numbers in the field 
of scalars can be squares. In the field of complex numbers every number 
is a square; that is, every complex number has at least one square root. 

Therefore, for each d t ^ we can choose x i = —= so that djX? = 1. 

y/di 
In this case the non-zero numbers appearing in the main diagonal of B" 
are all l's. Thus we have proved 

Theorem 10.2. If F is the field of complex numbers, then every symmetric 
matrix B is congruent to a diagonal matrix in which all the non-zero elements 
are Vs. The number of Vs appearing in the main diagonal is equal to the 
rank of B. □ 

The proof of Theorem 10.1 provides a thoroughly practical method for find- 
ing a non-singular P such that P T BP is a diagonal matrix. The first problem 



10 I The Normal Form 165 

is to find an <x[ such that q(a' x ) ^ 0. The range of choices for such an «.[ is 
generally so great that there is no difficulty in finding a suitable choice by 
trial and error. For the same reason, any systematic method for finding 
an 04 must be a matter of personal preference. 

Among other possibilities, an efficient system for finding an a[ is the 
following: First try < = a x . If qM = b u = 0, try < = a 2 . If ?(a 2 ) = 
b 2% = 0,thenqr( ai + a 2 ) = qM + 2/ s ( ai , a 2 ) + ?(a 2 ) = 2/ s ( ai a 2 ) = 2b 12 so 
that it is convenient to try aj = a x + a 2 . The point of making this sequence 
of trials is that the outcome of each is determined by the value of a single 
element of B. If all three of these fail, then we can pass our attention to 
<x 3 , a x + a 3 , and oc 2 + a 3 with similar ease and proceed in this fashion. 

Now, with the chosen *[, /,«, a) defines a linear functional <j>[ on V. 
If 04 is represented by (p n , . . . , p nX ) and a by (z x , . . . , *„), then 

n n n / n \ 

/ s (al, a) = 2 2 PaM* =22 PabnW (10.2) 

This means that the linear functional <f>[ is represented by [p n • • • p n i]-5. 

The next step described in the proof is to determine the subspace V^ 
annihilated by <f>' r However, it is not necessary to find all of W x . It is 
sufficient to find an <x 2 e W x such that q(a.' 2 ) 7* 0. With this a;, /,(a£, a) 
defines a linear functional <£ 2 on V. If a 2 is represented by (p 12 , . . . , /7„ 2 ), 
then <j>' 2 is represented by [p 12 • • • /? n2 ]-5- 

The next subspace we need is the subspace W 2 of W l annihilated by <f> 2 . 
Thus W 2 is the subspace annihilated by both cf>[ and 0;. We then select an 
ol' 2 from W 2 and proceed as before. 

Let us illustrate the entire procedure with an example. Consider 

"0 1 2" 



B = 



1 1 

2 10. 



Since b n = b 22 = 0, we take a{ = a x + a 2 = (1, 1, 0). Then the linear 
functional fa is represented by 

[1 1 0]B = [1 1 3]. 

A possible choice for an 04 annihilated by this linear functional is (1 , —1,0). 
The linear functional fa determined by (1, -1, 0) is represented by 
[1 _i (J]f}=[-1 1 1]. 

We should have checked to see that q{vQ 5^ 0, but it is easier to make that 
check after determining the linear functional fa 2 since q(u.' 2 ) = fa 2 a.' 2 = 
—2^0 and the arithmetic of evaluating the quadratic form includes all 
the steps involved in determining fa 2 . 



166 



Linear Functionals, Bilinear Forms, Quadratic Forms | IV 



We must now find an 0C3 annihilated by (f>[ and <f>' 2 . This amounts to solving 
the system of homogeneous linear equations represented by 

1 1 3" 

1 1 1 

A possible choice is a.' 3 = ( — 1, —2, 1). The corresponding linear func- 
tional (^3 is represented by 

[-1 -2 l]fl=[0 -4]. 
The desired matrix of transition is 

1 1 -r 



P = 



1 -1 




1 



1 


1 


3] 


1 


1 


1 








-4J 



1 


1 


-f 




1 


-1 


-2 


= 








1_ 





2 





0" 





-2 





.0 





-4. 



Since the linear functionals we have calculated along the way are the rows 
of P T B, the calculation of P T BP is half completed. Thus, 



P T BP = 



It is possible to modify the diagonal form by multiplying the elements in 
the main diagonal by squares from F. Thus, if F is the field of rational 
numbers we can obtain the diagonal {2, —2, —1}. If F is the field of real 
numbers we can get the diagonal {1,-1,-1}. If F is the field of com- 
plex numbers we can get the diagonal {1, 1, 1}. 

Since the matrix of transition P is a product of elementary matrices the 
diagonal from P T BP can also be obtained by a sequence of elementary 
row and column operations, provided the sequence of column operations 
is exactly the same as the sequence of row operations. This method is 
commonly used to obtain the diagonal form under the congruence. If an 
element b u in the main diagonal is non-zero, it can be used to reduce all other 
elements in row / and column / to zero. If every element in the main diagonal 
is zero and b tj ?± 0, then adding row j to row / and column j to column /* 
will yield a matrix with 2b 4j in the rth place of the main diagonal. The method 
is a little fussy because the same row and column operations must be used, 
and in the same order. 

Another good method for quadratic forms of low order is called com- 
pleting the square. If X T BX = 2"j=i x &a x i anc * b u ^ 0> then 



X T BX - f (b a x 1 + • • • + b in x n f 



(10.3) 



10 I The Normal Form 



167 



is a quadratic form in which x i does not appear. Make the substitution 



x'i = b n x 1 + • • • + b in x n . 



(10.4) 



Continue in this manner if possible. The steps must be modified if at any 
stage every element in the main diagonal is zero. If b tj ^ 0, then the sub- 
stitution x'. = x t + x t and x'. = x i - x t will yield a quadratic form repre- 
sented by a matrix with 7b u in the rth place of the main diagonal and -lb „ 
in the y'th place. Then we can proceed as before. In the end we will have 



x T Bx = f- (*;)■ + ■■ 

Oil 



(10.5) 



expressed as a sum of squares; that is, the quadratic form will be in diagonal 

form. 

The method of elementary row and column operations and the method 
of completing the square have the advantage of being based on concepts 
much less sophisticated than the linear functional. However, the com- 
putational method based on the proof of the theorem is shorter, faster, 
and more compact. It has the additional advantage of giving the matrix 
of transition without special effort. 



EXERCISES 

1. Reduce each of the following symmetric matrices to diagonal form. Use the 
method of linear functionals, the method of elementary row and column operations, 
and the method of completing the square, 



(«) 



(c) 



2 2" 
1 -2 



2 -2 





1 

-1 

2 



1 

1 



-1 



1 

-1 


-1 
1 



2" 

-1 

1 





(*) 



id) 



1 2 

2 

3 -1 



3" 

-1 

1 



'0 12 3' 
10 12 

2 10 1 

3 2 10 



2. Using the methods of this section, reduce the quadratic forms of Exercise 1, 
Section 9, to diagonal form. 

3. Each of the quadratic forms considered in Exercise 2 has integral coefficients. 
Obtain for each a diagonal form in which each coefficient in the main diagonal is 
a square-free integer. 



168 Linear Functional, Bilinear Forms, Quadratic Forms | IV 

11 I Real Quadratic Forms 

A quadratic form over the complex numbers is not really very interesting. 
From Theorem 10.2 we see that two different quadratic forms would be 
distinguishable if and only if they had different ranks. Two quadratic forms 
of the same rank each have coordinate systems (very likely a different 
coordinate system for each) in which their representations are the same. 
Hence, any properties they might have which would be independent of the 
coordinate system would be indistinguishable. 

In this section let us restrict our attention to quadratic forms over the 
field of real numbers. In this case, not every number is a square; for 
example, —1 is not a square. Therefore, having obtained a diagonalized 
representation of a quadratic form, we cannot effect a further transformation, 
as we did in the proof of Theorem 10.2 |to obtain all l's for the non-zero 
elements of the main diagonal. -The best we can do is to change the positive 
elements to + l's and the negative elements to — l's. There are many choices 
for a basis with respect to which the representation of the quadratic form has 
only +l's and —l's along the main diagonal. We wish to show that the 
number of + l's and the number of — l's are independent of the choice of the 
basis ; that is, these numbers are basic properties of the underlying quadratic 
form and not peculiarities of the representing matrix. 

Theorem 11.1. Let q be a quadratic form over the real numbers. Let P be 
the number of positive terms in a diagonalized representation of q and let N 
be the number of negative terms. In any other diagonalized representation of 
q the number of positive terms is P and the number of negative terms is N. 

proof. Let A = {a l5 . . . , a n } be a basis which yields a diagonalized 
representation of q with P positive terms and N negative terms in the main 
diagonal. Without loss of generality we can assume that the first P elements 
of the main diagonal are positive. Let B = {/J l5 ...,/?„} be another basis 
yielding a diagonalized representation of q with the first P' elements of the 
main diagonal positive. 

Let U = (a l5 . . . , a P > and let W = P > +1 , . . . , n ). Because of the 
form of the representation using the basis A, for any non-zero a e U we have 
q(a.) > 0. Similarly, for any peW we have q(jl) < 0. This shows that 
U nW = {0}. Now dim U = P, dim W = n - P' , and dim (U + W) < n. 
Thus P + n - P' = dim U + dim W = dim (U + W) + dim (U n W) = 
dim (U + W) < n. Hence, P — P' < 0. In the same way it can be shown 
that P' - P < 0. Thus P = P' and N = r - P = - P' = N'. □ 

Definition. The number S = P — N is called the signature of the quadratic 
form q. Theorem 11.1 shows that S is well defined. A quadratic form is 
called non-negative semi-definite if S = r. It is called positive definite if S = n. 



169 

1 1 | Real Quadratic Forms 

It is clear that a quadratic form is non-negative semi-definite if and only 
if q(a) > for all a e V. It is positive definite if and only if q(a) > for 
non-zero a e V. These are the properties of non-negative semi-definite 
and positive definite forms that make them of interest. We use them ex- 
tensively in Chapter V. 

If the field of constants is a subfield of the real numbers, but not the 
real numbers, we may not always be able to obtain +l's and -l's along 
the main diagonal of a diagonalized representation of a quadratic form. 
However, the statement of Theorem 11.1 and its proof referred only to the 
diagonal terms as being positive or negative, not necessarily +1 or -1. 
Thus the theorem is equally valid in a subfield of the real numbers, and the 
definitions of the signature, non-negative semi-definiteness, and positive 
definiteness have meaning. 

In calculus it is shown that 



I 



e~ xt dx= 7T Vi . 



It happens that analogous integrals of the form 

C oo /*oo 

J = • • • g-S*ia«*, dx x - ■ • dx n 

J— 00 J — 00 

appear in a number of applications. The term J x fia x i = xT AX appearing 
in the exponent is a quadratic form, and we can assume it to be symmetric. 
In order that the integrals converge it is necessary and sufficient that the 
quadratic form be positive definite. There is a non-singular matrix P such 
that P T AP = L is a diagonal matrix. Let {X lt . . . , K) be the main diagonal 
of L. If X = (aj 1} ...,»„} are the old coordinates of a point, then Y = 

NT . $ X i _ 

(2/i, • • • , y n ) are the new coordinates where x t = ZiP^y Since ^7 — Pw 
the Jacobian of the coordinate transformation is det P. Thus, ° 



f 00 /*oo 

= ... e -^iv<* det Pd Vl --- dy n 

J—ao J— 00 

= det P\ e'^ yi2 dy x ---\ e~ XnVn dy n 
J— 00 J-°° 



= deti > '"& 

«/e det P 



= 7r ^■■■X^' 



170 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

Since X 1 • • ■ X n = det L = det P det A det P = det P 2 det A, we have 

it/2 
I = 



det A" 
EXERCISES 

1. Determine the rank and signature of each of the quadratic forms of Exercise 1, 
Section 9. 

2. Show that the quadratic form Q(x, y) = ax 2 + bxy + cy\a, b, c real) is 
positive definite if and only if a > and b 2 — Aac < 0. 

3. Show that if A is a real symmetric positive definite matrix, then there exists 
a real non-singular matrix P such that A = P T P. 

4. Show that if A is a real non-singular matrix, then A T A is positive definite. 

5. Show that if A is a real symmetric non-negative semi-definite matrix — that is, 
A represents a non-negative semi-definite quadratic form — then there exists a 
real matrix R such that A = R T R. 

6. Show that if A is real, then A T A is non-negative semi-definite. 

7. Show that if A is real and A T A = 0, then A = 0. 

8. Show that if A is real symmetric and A 2 = 0, then A = 0. 

9. If /* 1? . . . , A r are real symmetric matrices, show that 

A 2 + ■ ■ ■ + A 2 = 
implies A 1 = A 2 — • • • = A r = 0. 

12 I Hermitian Forms 

For the applications of forms to many problems, it turns out that a 
quadratic form obtained from a bilinear form over the complex numbers 
is not the most useful generalization of the concept of a quadratic form 
over the real numbers. As we see later, the property that a quadratic form 
over the real numbers be positive-definite is a very useful property. While 
x 2 is positive-definite for real x, it is not positive-definite for complex x. 
When dealing with complex numbers we need a function like |a;| 2 = xx, 
where x is the conjugate complex of x. xx is non-negative for all complex 
(and real) x, and it is zero only when x = 0. Thus xx is a form which has 
the property of being positive definite. In the spirit of these considerations, 
the following definition is appropriate. 

Definition. Let F be the field of complex numbers, or a subfield of the 
complex numbers, and let V be a vector space over F. A scalar valued 



12 | Hermitian Forms 171 

function /of two vectors, a, /? £ V is called a Hermitian form if 



(1) /(a, /?)=/(£, a). (12.1) 

(2) /(a, Z>A + £ 2 &) = V("> ft) + V(a, A)- 

A Hermitian form differs from a symmetric bilinear form in the taking 
of the conjugate complex when the roles of the vectors <x and @ are inter- 
changed. But the appearance of the conjugate complex also affects the 
bilinearity of the form. Namely, 



/(fliai + a 2 a 2 , ft) =/(/?, a x v. x + a 2 a 2 ) 



= *i/(/0, ai) + aafiP, «2> 
= aJX/?, ax) + fla/Off, a 2 ) 
= flj/Cax, ]8) + a 2 /(a 2 , /?). 

We describe this situation by saying that a Hermitian form is linear in the 
second variable and conjugate linear in the first variable. 

Accordingly, it is also convenient to define a more appropriate general- 
ization to vector spaces over the complex numbers of the concept of a 
bilinear form on vector spaces over the real numbers. A function of two 
vectors on a vector space over the complex numbers is said to be conjugate 
bilinear if it is conjugate linear in the first variable and linear in the second. 
We say that a function of two vectors is Hermitian symmetric if /(a, /S) = 
/(/?, a). This is the most useful generalization to vector spaces over the 
complex numbers of the concept of symmetry for vector spaces over the 
real numbers. In this terminology a Hermitian form is a Hermitian sym- 
metric conjugate bilinear form. 

For a given Hermitian form/, we define q(<x) =/(a, a) and obtain what 
we call a Hermitian quadratic form. In dealing with vector spaces over the 
field of complex numbers we almost never meet a quadratic form obtained 
from a bilinear form. The useful quadratic forms are the Hermitian quadratic 
forms. 

Let A = {a l5 . . . , a n } be any basis of V. Then we can let/(a i , a,-) = h u 
and obtain the matrix H = [h tj ] representing the Hermitian form / with 
respect to A. H has the property that h it =f(ct it a.,) =/(a 3 , a,) = h H , 
and any matrix which has this property can be used to define a Hermitian 
form. Any matrix with this property is called a Hermitian matrix. 

If A is any matrix, we denote by A the matrix obtained by taking the 
conjugate complex of every element of A; that is, if A = [a^thenA = [a u ]. 
We denote A T = A T by A*. In this notation a matrix His Hermitian if and 
only if H* = H. 

If a new basis 8 = {/? 1} . . . , @ n } is selected, we obtain the representation 



172 



Linear Functionals, Bilinear Forms, Quadratic Forms | IV 



H' = [h'^] where h' tJ = f(fa, fa). Let P be the matrix of transition; that 

is > fa = SU Pa** Then 

K t =m,h) 



(n n \ 

2 Pki*k, 2 PsM 
n in 



= 2P«i2 Pkif(*k,*s) 
s =l fc=l 

« n 
= Z, .2, PkihjcsPsj- 



(12.3) 



In matrix form this equation becomes //' = P*HP. 

Definition. If a non-singular matrix P exists such that H' = P*HP, we say 
that H and H' are Hermitian congruent. 

Theorem 12.1. For a given Hermitian matrix H there is a non-singular 
matrix P such that H' = P*HP is a diagonal matrix. In other words, iff is 
the underlying Hermitian form, there is basis A' = {a[, . . . , a^} such that 
/(<x t ', a J) = whenever i ^ j. 

proof. The proof is almost identical with the proof of Theorem 10. 1 , the 
corresponding theorem for bilinear forms. There is but one place where 
a modification must be made. In the proof of Theorem 10.1 we made use of 
a formula for recovering the symmetric part of a bilinear form from the 
associated quadratic form. For Hermitian forms the corresponding formula 
is 

±[?(a + fa - q{* -fa- iq(* + ifa + ty(a - ifa) = /(«> fa- ( l2A ) 

Hence, if /is not identically zero, there is an o^ e V such that qfa) ^ 0. 
The rest of the proof of Theorem 10.1 then applies without change. □ 

Again, the elements of the diagonal matrix thus obtained are not unique. 
We can transform H' into still another diagonal matrix by means of a 
diagonal matrix Q with x x , . . . , x n , x i ^ 0, along the main diagonal. In this 
fashion we obtain _ 



H" = Q*H'Q = 











(12.5) 



12 I Hermitian Forms 



173 



We see that, even though we are dealing with complex numbers, this trans- 
formation multiplies the elements along the main diagonal of H' by positive 
real numbers. 

Since q(«.) = /(a, a) = /(a, a), q{ct) is always real. We can, in fact, apply 
without change the discussion we gave for the real quadratic forms. Let 
P denote the number of positive terms in the diagonal representation of q, 
and let N denote the number of negative terms in the main diagonal. The 
number S = P — N is called the signature of the Hermitian quadratic 
form q. Again, P + N = r, the rank of q. 

The proof that the signature of a Hermitian quadratic form is independent 
of the particular diagonalized representation is identical with the proof 
given for real quadratic forms. 

A Hermitian quadratic form is called non-negative semi-definite if S = r. 
It is called positive definite if S = n. If/is a Hermitian form whose associated 
Hermitian quadratic form q is positive-definite (non-negative semi-definite), 
we say that the Hermitian form / is positive-definite (non-negative semi- 
definite). 

A Hermitian matrix can be reduced to diagonal form by a method analo- 
gous to the method described in Section 10, as is shown by the proof of 
Theorem 12.1. A modification must be made because the associated Her- 
mitian form is not bilinear, but complex bilinear. 

Let a( be a vector for which q{a£ ^ 0. With this fixed &[, f (a.^ a) defines 
a linear functional <j>[ on V. If 04 is represented by 

On, • • • , pm) = p and a b y ( x i> ...,xj = x, 

then 

n n 

/(ai, a) = ^ 2 TTxKi x i 

i=i i=i 

n / n \ 

= l(lp7.h i) )x, (12.6) 

This means the linear functional <j>[ is represented by P*H. 

EXERCISES 

1. Reduce the following Hermitian matrices to diagonal form. 



(a) 


"1 -f 
1 1 




(b) 


- 1 1 - r 




1 +/ 


1 



2. Let/be an arbitrary complex bilinear form. Define/* by the rule,/*(a, /?) = 



/(/3, a). Show that/* is complex bilinear. 



174 Linear Functionals, Bilinear Forms, Quadratic Forms | IV 

3. Show that if H is a positive definite Hermitian matrix — that is, H represents 
a positive definite Hermitian form — then there exists a non-singular matrix P such 
that H = P*P. 

4. Show that if A is a complex non-singular matrix, then A* A is a positive 
definite Hermitian matrix. 

5. Show that if H is a Hermitian non-negative semi-definite matrix — that is, H 
represents a non-negative semi-definite Hermitian quadratic form — then there 
exists a complex matrix R such that H = R*R. 

6. Show that if A is complex, then A* A is Hermitian non-negative semi-definite. 

7. Show that if A is complex and A* A = 0, then A = 0. 

8. Show that if A is hermitian and A 2 = 0, then A = 0. 

9. If A x , . . . , A r are Hermitian matrices, show that A x 2 + • • ■ + A* = 
implies A x = • • • = A r = 0. 

10. Show by an example that, if A and B are Hermitian, it is not necessarily 
true that AB is Hermitian. What is true if A and B are Hermitian and AB = BA1 



chapter 



V 



Orthogonal 
and unitary 
transformations, 
normal matrices 



In this chapter we introduce an inner product based on an arbitrary positive 
definite symmetric bilinear form, or Hermitian form. On this basis the 
length of a vector and the concept of orthogonality can be defined. From 
this point on, we concentrate our attention on bases in which the vectors are 
mutually orthogonal and each is of length 1 , the orthonormal bases. The 
Gram-Schmidt process for obtaining an orthonormal basis from an arbitrary 
basis is described. 

Isometries are linear transformations which preserve length. They also 
preserve the inner product and therefore map orthonormal bases onto 
orthonormal bases. It is shown that a matrix representing an isometry has 
exactly the same properties as a matrix of transition representing a change 
of bases from one orthonormal basis to another. If the field of scalars is 
real, these matrices are said to be orthogonal; and if the field of scalars is 
complex, they are said to be unitary. 

If A is an orthogonal matrix, we show that A T = A- 1 ; and if A is unitary, 
we show that A* = A- 1 . Because of this fact a matrix representing a linear 
transformation and a matrix representing a bilinear form are transformed 
by exactly the same formula under a change of coordinates provided that 
the change is from one orthonormal basis to another. This observation 
unifies the discussions of Chapter III and IV. 

The penalty for restricting our attention to orthonormal bases is that 
there is a corresponding restriction in the linear transformations and bilinear 
forms that can be represented by diagonal matrices. The necessary and 
sufficient condition that this be possible, expressed in terms of matrices, 
is that A* A = A A*. Matrices with this property are called normal matrices. 
Fortunately, the normal matrices constitute a large class of matrices and 

175 



176 Orthogonal and Unitary Transformations, Normal Matrices | V 

they happen to include as special cases most of the types that arise in physical 
problems. 

Up to a certain point we can consider matrices with real coefficients to 
be special cases of matrices with complex coefficients. However, if we wish 
to restrict our attention to real vector spaces, then the matrices of transition 
must also be real. This restriction means that the situation for real vector 
spaces is not a special case of the situation for complex vector spaces. In 
particular, there are real normal matrices that are unitary similar to diagonal 
matrices but not orthogonal similar to diagonal matrices. The necessary 
and sufficient condition that a real matrix be orthogonal similar to a diagonal 
matrix is that it be symmetric. 

The techniques for finding the diagonal normal form of a normal matrix 
and the unitary or orthogonal matrix of transition are, for the most part, 
not new. The eigenvalues and eigenvectors are found as in Chapter III. We 
show that eigenvectors corresponding to different eigenvalues are automati- 
cally orthogonal so all that needs to be done is to make sure that they are of 
length 1. However, something more must be done in the case of multiple 
eigenvalues. We are assured that there are enough eigenvectors, but we 
must make sure they are orthogonal. The Gram-Schmidt process provides 
the method for finding the necessary orthohormal eigenvectors. 

1 I Inner Products and Orthogonal Bases 

Even when speaking in abstract terms we have tried to draw an analogy 
between vector spaces and the geometric spaces we have encountered in 
2- and 3-dimensional analytic geometry. For example, we have referred to 
lines and planes through the origin as subspaces ; however, we have nowhere 
used the concept of distance. Some of the most interesting properties of 
vector spaces and matrices deal with the concept of distance. So in this 
chapter we introduce the concept of distance and explore the related proper- 
ties. 

For aesthetic reasons, and to show as clearly as possible that we need not 
have an a priori concept of distance, we use an approach which will emphasize 
the arbitrary nature of the concept of distance. 

It is customary to restrict attention to the field of real numbers or the field 
of complex numbers when discussing vector space concepts related to dis- 
tance. However, we need not be quite that restrictive. The scalar field F 
must be a subfield of the complex numbers with the property that, if a e F, 
the conjugate complex a is also in F. Such a field is said to be normal over its 
real subfield. The real field and the complex field have this property, but 
so do many other fields. For most of the important applications of the mate- 
rial to follow the field of scalars is taken to be the real numbers or the field 



1 I Inner Products and Orthogonal Bases 177 

of complex numbers. Although most of the proofs given will be valid for 
any field normal over its real subfield, it will suffice to think in terms of the 
two most important cases. 

In a vector space V of dimension n over the complex numbers (or a subfield 
of the complex numbers normal over its real subfield), let /be any fixed 
positive definite Hermitian form. For the purpose of the following develop- 
ment it does not matter which positive definite Hermitian form is chosen, 
but it will remain fixed for all the remaining discussion. Since this particular 
Hermitian form is now fixed, we write (a, /?) instead of /(a, /?). (a, /?) is 
called the inner product , or scalar product, of a and /5. 

Since we have chosen a positive d efinite Hermitian form, (a, a) > 
and (a, a) > unless a = 0. Thus V(a, a) = ||a|| is a well-defined non- 
negative r eal num ber which we call the length or norm of a. Observe that 
11**11 = v(tfa, aa) = \laa(<x, a) = \a\ • ||a||, so that multiplying a vector 
by a scalar a multiplies its length by \a\. We say that the distance between 
two vectors is the norm of their difference; that is, d(&, p) = \\0 - <x||. 
We should like to show that this distance function has the properties we 
might reasonably expect a distance function to have. But first we have to 
prove a theorem that has interest of its own and many applications. 

Theorem 1.1. For any vectors a,^V, |(a, 0)\ < ||a|| • |||g||. This in- 
equality is known as Schwarz's inequality. 

proof. For t a real number consider the inequality 

< ||(*, fit* - fl|« = |(a, 0)|» ||«|[* t* - It | (a, P)\ 2 + ||/?||«. (1.1) 

If || a || = 0, the fact that this inequality must hold for arbitrarily large t 
implies that |(a, f})\ = so that Schwarz's inequality is satisfied. If ||a|| ^ 0, 
take t = 1/ 1| a|| 2 . Then (1.1) is equivalent to Schwarz's inequality, 

l(*,/?)l< l|a||-|l/?ll.n (1.2) 

This proof of Schwarz's inequality does not make use of the assumption 
that the inner product is positive definite and would remain valid if the 
inner product were merely semi-definite. Using the assumption that the 
inner product is positive definite, however, an examination of this proof of 
Schwarz's inequality would reveal that equality can hold if and only if 

(*-(,*■ = *> (1-3) 

(a, a) 

that is, if and only if /? is a multiple of a. 

[f a ?£ and ^ 0, Schwarz's inequality can be written in the form 

7^ < 1. (1.4) 



178 Orthogonal and Unitary Transformations, Normal Matrices | V 

In vector analysis the scalar product of two vectors is equal to the product 
of the lengths of the vectors times the cosine of the angle between them. 
The inequality (1.4) says, in effect, that in a vector space over the real 

numbers the ratio ' can be considered to be a cosine. It would be 

11*11 • IIpII 

a diversion for us to push this point much further. We do, however, wish 
to show that d(oc, (S) behaves like a distance function. 

Theorem 1.2. For d(a, fi) = \\(t — a||, we have, 

(1) rf(a, 0) = d(P, a), 

(2) </(a, p) > and d(a, 0) = if and only ifot = fi, 

(3) d(x, 0) < d(a, y) + d(y, ft. 

proof. (1) and (2) are obvious. (3) follows from Schwarz's inequality. 
To see this, observe that 

||a + /?|| 2 = (a + 0,a + /0) 

= (a, a) + (a, 0) + (0, a) + (P, j8) 



= ||a|| 2 + (a, / S) + (a,£) + ||/?||« 

< || a || 2 + 2 |(a,0)| + II^P 

< Ml 2 + 2 ||a|| • W\\ + \\PV = (11*11 + WWf- (1.5) 
Replacing a by y — a and /? by /? — y, we have 

11/8 -a|| < ||y-a|| + ll£-y||.n (1.6) 

(3) is the familiar triangular inequality. It implies that the sum of two small 
vectors is also small. Schwarz's inequality tells us that the inner product 
of two small vectors is small. Both of these inequalities are very useful for 
these reasons. 

According to Theorem 12.1 of Chapter IV and the definition of a positive 
definite Hermitian form, there exists a basis A = {a 1? . . . , a w } with respect 
to which the representing matrix is the unit matrix. Thus, 

(a„ a,) = <5„. (1.7) 

Relative to this fixed positive definite Hermitian form, the inner product, 
every set of vectors that has this property is called an orthonormal set. 
The word "orthonormal" is a combination of the words "orthogonal" 
and "normal." Two vectors a and fi are said to be orthogonal if (a, /9) = 
(jS, a) = 0. A vector a is normalized if it is of length 1 ; that is, if (a, a) = 
1. Thus the vectors of an orthonormal set are mutually orthogonal and nor- 
malized. The basis A chosen above is an orthonormal basis. We shall see 
that orthonormal bases possess particular advantages for dealing with the 
properties of a vector space with an inner product. A vector space over the 
complex numbers with an inner product such as we have defined is called 



1 I Inner Products and Orthogonal Bases 179 

a unitary space. A vector space over the real numbers with an inner product 
is called a Euclidean space. 

For <x, e V, let a = 2? =1 s 4 a, and /ff = 2? =1 y<a f . Then 






= 2 



X,- 



2 y^** a i) 



t=i i_j=i 

= 2 x iVi . (1.8) 



«=i 



If we represent a by the n-tuple (x lf . . . , scj = X, and /? by the n-tuple 
(y 1? . . . , y n ) = Y, the inner product can be written in the form 

(a,/0 = i^ = x*y. (1-9) 

i=l 

This is a familiar formula in vector analysis where it is also known as the 
inner product, scalar, or dot product. 

Theorem 1.3. An orthonormal set is linearly independent. 

proof. Suppose that {£ l5 £ 2 , . . . } is an orthonormal set and that ^i x£i = 
0. Then = (£,, 0) = (£,, f t *,&) = 2, *<(* y. W = x i- Thus the set is 
linearly independent. □ 

It is an immediate consequence of Theorem 1.3 that an orthonormal set 
cannot contain more than n elements. 

Since V has at least one orthonormal basis and orthonormal sets are 
linearly independent, some questions naturally arise. Are there other 
orthonormal bases ? Can an orthonormal set be extended to an orthonormal 
basis ? Can a linearly independent set be modified to form an orthonormal 
set? For infinite dimensional vector spaces the question of the existence 
of even one orthonormal basis is a non-trivial question. For finite dimen- 
sional vector spaces all these questions have nice answers, and the technique 
employed in giving these answers is of importance in infinite dimensional 
vector spaces as well. 

Theorem 1.4. IfA = {a l9 . . . , a s } is any linearly independent set whatever 
in V, there exists an orthonormal set X = {| l5 . . . , | s } such that £ fe = 2i=i a ifc a i- 

proof. (The Gram-Schmidt orthonormalization process). Since a x is 
an element of a linearly independent set <x x ^ 0, and therefore HaJ > 0. 

Let Z 1 = T -r. oci- Clearly, U x \\ = 1. 
IK II 

Suppose, then, {^, . . . , £ r } has been found so that it is an orthonormal 

set and such that each £ k is a linear combination of {a l5 . . . , a^.}. Let 

<*Ul = a r+l - (fl» <*r4-l)fl - ' - ' - (£■> <*r+l)£r- C 1 ' 10 ) 



180 Orthogonal and Unitary Transformations, Normal Matrices | V 

Then for any £„ 1 < i < r, we have 

(ft, < + i) = (ft> O - (ft> a r + i) = 0. (1.11) 

Furthermore, since each | fc is a linear combination of the {a x , . . . , aj, 
a; +l is a linear combination of the {a l5 . . . , a r+1 }. Also, < +1 is not zero 
since {a x , . . . , a r+1 } is a linearly independent set and the coefficient of 
a r+1 in the representation of a' r+1 is 1. Thus we can define 

II Or +1 II 

Clearly, {| l5 . . . , £ r+1 } is an orthonormal set with the desired properties. 
We can continue in this fashion until we exhaust the elements of A. The 
set X = {g lt . . . , |J has the required properties. □ 

The Gram-Schmidt process is completely effective and the computations 
can be carried out exactly as they are given in the proof of Theorem 1.4. 
For example, let A = {a x = (1, 1,0, 1), a 2 = (3, 1, 1, -1), a 3 = (0, 1, -1, 
1)}. Then 

£ 1 = ^=(1,1,0,1), 

V3 

a 2 = (3, 1, 1, -1) - 4=4=(1, 1,0, 1) = (2,0, 1, -2), 

V3V3 

£ 2 = 1(2, 0, 1, -2), 

a 3 = (0, l, -l, l) - 4~7= (1 ' i- - 1} ~ l^ 2 ' ' u ~ 2) 

V3V3 3 3 

= 1(0,1,-2,-1), 

| 3 = i(0, 1,-2,-1). 
V6 

It is easily verified that {&, £ 2 , f 8 > is an orthonormal set. 

Corollary 1.5. If A = {a l5 . . . , aj is a basis of V, the orthonormal set 
X = {f x , . . . , |J, obtained from A by the application of the Gram-Schmidt 
process, is an orthonormal basis of V. 

proof. Since X is orthonormal it is linearly independent. Since it 
contains n vectors it also spans V and is a basis. □ 

Theorem 1.4 and its corollary are used in much the same fashion in which 
we used Theorem 3.6 of Chapter I to obtain a basis (in this case an ortho- 
normal basis) such that a subset spans a given subspace. 



1 I Inner Products and Orthogonal Bases 181 

Theorem 1.6. Given any vector a x of length 1, there is an orthonormal 
basis with a x as the first element. 

proof. Since the set {o^} is linearly independent it can be extended to a 
basis with a x as the first element. Now, when the Gram-Schmidt process 
is applied, the first vector, being of length 1, is unchanged and becomes the 
first vector of an orthonormal basis. □ 



EXERCISES 

In the following problems we assume that all n-tuples are representations of 
their vectors with respect to orthonormal bases. 

1. Let A ={<*!,..., a 4 } be an orthonormal basis of R 4 and let a, fie V be 
represented by (1, 2, 3, -1) and (2, 4,-1, 1), respectively. Compute (a, p). 

2. Let a = (1, /, 1 + /) and P = (/, 1, / — 1) be vectors in C 3 , where C is the 
field of complex numbers. Compute (a, p). 

3. Show that the set {(1, /, 2), (1, i, -1), (1, -/, 0)} is orthogonal in C 3 . 

4. Show that (a, 0) = (0, a) = for all a £ V. 

5. Show that ||a + pf + ||<x - /?|| 2 = 2 ||oc|| 2 + 2 ||£|| 2 . 

6. Show that if the field of scalars is real and ||a|| = \\p\\, then a - p and <x + p 
are orthogonal, and conversely. 

7. Show that if the field of scalars is real and ||a + p\\ 2 = ||a|| 2 + ||£|| 2 , then 
a and P are orthogonal, and conversely. 

8. Verify Schwarz's inequality for the vectors a and jS in Exercises 1 and 2. 

9. The set {(1, -1, 1), (2, 0, 1), (0, 1, 1)} is linearly independent, and hence a 
basis for F 3 . Apply the Gram-Schmidt process to obtain an orthonormal basis. 

10. Given the basis {(1,0, 1, 0), (1, 1, 0, 0), (0, 1, 1, 1,), (0, 1,1,0)} apply the 
Gram-Schmidt process to obtain an orthonormal basis. 

11. Let W be a subspace of V spanned by {(0, 1,1,0), (0, 5, -3, -2), (-3, 
—3, 5, —7)}. Find an orthonormal basis for W. 

12. In the space of real integrable functions let the inner product be defined by 

«i 

f(x)g(x) dx. 



- 



Find a polynomial of degree 2 orthogonal to 1 and x. Find a polynomial of degree 
3 orthogonal to 1, x, and x 2 . Are these two polynomials orthogonal? 

13. Let X = {£j, . . . , £ TO } be a set of vectors in the n-dimensional space V. 
Consider the matrix G — [gjj] where 

ga = (ft. £*)• 

Show that if X is linearly dependent, then the columns of G are also linearly 
dependent. Show that if X is linearly independent, then the columns of G are also 



182 Orthogonal and Unitary Transformations, Normal Matrices | V 

linearly independent. Det G is known as the Gramian of the set X. Show that X is 
linearly dependent if and only if det G = 0. Choose an orthonormal basis in V and 
represent the vectors in X with respect to that basis. Show that G can be represented 
as the product of an m x n matrix and an n x m matrix. Show that det G > 0. 



*2 I Complete Orthonormal Sets 

We now develop some properties of orthonormal sets that hold in both 
finite and infinite dimensional vector spaces. These properties are deep 
and important in infinite dimensional vector spaces, but in finite dimensional 
vector spaces they could easily be developed in passing and without special 
terminology. It is of some interest, however, to borrow the terminology of 
infinite dimensional vector spaces and to give proofs, where possible, which 
are valid in infinite as well as finite dimensional vector spaces. 

Let X = {£ l5 | 2 , . . . } be an orthonormal set and let a be any vector in V. 
The numbers {o^ = (| i5 a)} are called the Fourier coefficients of a. 

There is, first, the question of whether an expression like 2* x £i nas anv 
meaning in cases where infinitely many of the x i are non-zero. This is a 
question of the convergence of an infinite series and the problem varies 
from case to case so that we cannot hope to deal with it in all generality. 
We have to assume for this discussion that all expressions like 2* x Si tnat 
we write down have meaning. 

Theorem 2.1. The minimum of ||a — 2* x i£A\ * s attained if and only if all 

x i = (&, x ) = «<• 

PROOF. 

II* - 2 x Sif = (a - 2 x £i> a - 2 x &) 

i i i 

= (a, a) - 2 X A - 2 x i a i + 2 x i x i 

i i i 

= 1 afii - ^ X A - 2 x i a i + 2 x i x i + ( a > a ) - 2 d i a i 

i i i i i 

= 2 («.■ - x i)( a i - x i) + II a f - 2 *i°i 

i i 

= 2 k - x i\ 2 + ll«ll" - 2 k-l 2 - (2.D 

i i 

Only the term 2* \ a i ~ x i\ 2 depends on the x t and, being a sum of real 
squares, it takes on its minimum value of zero if and only if all x t = a t . □ 

Theorem 2.1 is valid for any orthonormal set X, whether it is a basis or 
not. If the norm is used as a criterion of smallness, then the theorem says 
that the best approximation of a in the form 2* x £i (using only the £ f e X) 
is obtained if and only if all x i are the Fourier coefficients. 



2 | Complete Orthonormal Sets * 83 

Theorem 2.2 2* k| 2 < ||a|| 2 . This inequality is known as BesseVs in- 
equality. 

proof. Setting x t = a t in equation (2.1) we have 

l|a|| 2 -2l^| 2 =ll a -I«^ll 2 ^°- D (2,2) 

It is desirable to know conditions under which the Fourier coefficients 
will represent the vector a. This means we would like to have a = 2t a*!*. 
In a finite dimensional vector space the most convenient sufficient con- 
dition is that X be an orthonormal basis. In the theory of Fourier series 
and other orthogonal functions it is generally not possible to establish 
the validity of an equation like a = 2* «,£< without some modification of 
what is meant by convergence or a restriction on the set of functions under 
consideration. Instead, we usually establish a condition known as com- 
pleteness. An orthonormal set is said to be complete if and only if it is not 
a subset of a larger orthonormal set. 

Theorem 2.3. Let X = {£ <} be an orthonormal set. The following three 
conditions are equivalent: 

(1) For each aJeV, (a, £) = 2 (£Ta)(£„ 0). ( 2 - 3 ) 

i 

(2) For each a e V, ||a|| 2 = 2 1(1,, a)| 2 . (2-4) 

(3) X is complete. 

Equations (2.3) and (2.4) are both known as Parseva Vs ide ntities. 
proof. Assume (1). Then ||a|| 2 = (a, a) = 2* (|„ a)(£„ a) 

Assume (2). If X were not complete, it would be contained in a larger 
orthonormal set Y. But for any a eY,aL $ X, we would have 

l=l|aol| 2 =2K^«o)| 2 = 

i 

because of (2) and the assumption that Y is orthonormal. Thus X is complete. 
Now, assume (3). Let be any vector in V and consider /?' = p — 2* (!<, 
/%. Then 

(f,,W«(*«,/8-2(^0)£i) 

* (ft, - 2 (f, .#(****) 

= (£,/?) -(£,/?) =0; 
that is, 0' is orthogonal to all £, e X. If ||0'|| * 0, then X U |pj £'} 



184 Orthogonal and Unitary Transformations, Normal Matrices | V 

would be a larger orthonormal set. Hence, \\@'\\ = 0. Using the assumption 
that the inner product is positive definite we can now conclude that ft' = 0. 
However, it is not necessary to use this assumption and we prefer to avoid 
using it. What we really need to conclude is that if a is any vector in V then 
(a, /?') = 0, and this follows from Schwarz's inequality. Thus we have 

i 

= (",£)- 2 (^)(a,l*) 



i 

or 



(a, = 2 (**«)(&#■ 

i 

This completes the cycle of implications and proves that conditions (1), 
(2), and (3) are equivalent. □ 

Theorem 2.4. The following two conditions are equivalent: 

(4) The only vector orthogonal to all vectors in X is the zero vector. 

(5) For each a e V, a = 2 (f*» *)&• ( 2 -5) 

i 

proof. Assume (4). Let a be any vector in V and consider a' = a — 
2^ (I*, a)£ 4 . Then 

(£„«') = (£«,« -2 (**«)*<) 
= &,«)- 2 (**.«)&.**) 

= (£, a) - (£, a) = 0; 

that is, a' is orthogonal to all ^ g X. Thus a' = and a = 2» (£»•> a )£i- 
Now, assume (5) and let a be orthogonal to all £ t g X. Then a = 
2, (*„ a)*< = 0. a 

Theorem 2.5. The conditions (4) or (5) wn/?/y ?/ze conditions (1), (2), and 
(3). 
proof. Assume (5). Then 

(a, = (2 (£.«)£«. 2 (**/%) 



= 2(ft.«)2(^0(^W 



= 2 (^ «)(£<> 0- □ 

i 

Theorem 2.6. If the inner product is positive definite, the conditions (1), 
(2), or (3) imply the conditions (4) and (5). 



2 | Complete Orthonormal Sets 185 

proof. In the proof that (3) implies (1) we showed that if a' = a - 
2< (fi, a)!,, then ||a'|| = 0. If the inner product is positive definite, then 
a' = and, hence, 

a = J (| if a)|,. D 

The proofs of Theorems 2.3, 2.4, and 2.5 did not make use of the positive 
definiteness of the inner product and they remain valid if the inner product 
is merely non-negative semi-definite. Theorem 2.6 depends critically on 
the fact that the inner product is positive definite. 

For finite dimensional vector spaces we always assume that the inner 
product is positive definite so that the three conditions of Theorem 2.3 and 
the two conditions of Theorem 2.4 are equivalent. The point of our making 
a distinction between these two sets of conditions is that there are a number 
of important inner products in infinite dimensional vector spaces that are 
not positive definite. For example, the inner product that occurs in the 
theory of Fourier series is of the form 

( a , ft = -1 f * *@)P(x) dx. (2.6) 

This inner product is non-negative semi-definite, but not positive definite if 
V is the set of integrable functions. Hence, we cannot pass from the com- 
pleteness of the set of orthogonal functions to a theorem about the con- 
vergence of a Fourier series to the function from which the Fourier 
coefficients were obtained. 

In using theorems of this type in infinite dimensional vector spaces in 
general and Fourier series in particular, we proceed in the following manner. 
We show that any a £ V can be approximated arbitrarily closely by finite 
sums of the form J f xj t . For the theory of Fourier series this theorem is 
known as the Weierstrass approximation theorem. A similar theorem must 
be proved for other sets of orthogonal functions. This implies that the 
minimum mentioned in Theorem 2.1 must be zero. This in turn implies that 
condition (2) of Theorem 2.3 holds. Thus Parseval's equation, which is 
equivalent to the completeness of an orthonormal set, is one of the principal 
theorems of any theory of orthogonal functions. Condition (5), which is the 
convergence of a Fourier series to the function which it represents, would 
follow if the inner product were positive definite. Unfortunately, this is 
usually not the case. To get the validity of condition (5) we must either add 
further conditions or introduce a different type of convergence. 

EXERCISES 

1. Show that if X is an orthonormal basis of a finite dimensional vector space, 
then condition (5) holds. 



186 Orthogonal and Unitary Transformations, Normal Matrices | V 

2. Let X be a finite set of mutually orthogonal vectors in V. Suppose that the 
only vector orthogonal to each vector in X is the zero vector. Show that X is a 
basis of V. 

3 I The Representation of a Linear Functional by an Inner Product 

For a fixed vector /? £ V, (/?, a) is a linear function of a. Thus there is a 
linear functional e V such that 0(a) = (0, a) for all a. We denote the linear 
functional defined in this way by <f> p . The following theorem is a converse 
of this observation. 

Theorem 3.1. Given a linear functional <f> e V, there exists a unique r\ g V 
such that 0(a) = (rj, a.) for all a e V. 

proof. Let X = {£ l5 . . . , £ n } be an orthonormal basis of V, and let 
X = {(/>!,..., <f> n } be the dual basis. Let e V have the representation 

4> = 2?=i Vih- Define v = L n =i yJi- Then for each %i> 0?> £i) = 
(2?-i $&> £) = 2r=i ytfi> **) = ^ = 2T-i y^<(^) = ^ (W- But then 

0(a) and (77, a) are both linear functional on V that coincide on the basis, 
and hence coincide on all of V. 

If r] x and t] 2 are two choices such that (r/ lt a) = (rj 2 , a) = 0(a) for all 
a 6 V, then (^ — rj 2 , a) = for all a e V. For a = ^ — % this means 
(?h — ??2> ?h — %) = °- Hence, r\ x — ?y 2 = and the choice for ?7 is 
unique. □ 

Call the mapping denned by this theorem r\\ that is, for each e V, 
rj(<f>)eV has the property that 0(a) = (^(0), a) for all a e V. 

.A. 

Theorem 3.2. The correspondence between <f> e V and ??(0) e V w one-to-one 
and onto V. 

proof. In Theorem 3.1 we have already shown that r)(<f>) is well defined. 
Let j8 be any vector in V and let <f> p be the linear functional in V such that 
^(a) = (|8, a) for all a. Then fi = f]((f> p ) and the mapping is onto. Since 
(jft, a), as a function of a, determines a unique linear functional <f> p the 
correspondence is one-to-one. □ 

Theorem 3.3. If the inner product is symmetric, r\ is an isomorphism of V 
onto V. 

proof. We have already shown in Theorem 3.2 that rj is one-to-one and 
onto. Let = 2< M< and consider p = ]£< 6^(0*). Then (0, a) = 
(2<M(&)> *) = (*,2<bM4>J) = I<b t (*, V(<f>i)) = Iibi(ri(<t>i), a) = 
j, ^(a) = 0(a). Thus ??(0) = £ = £< bMh) and »? is linear - D 

Notice that 77 is not linear if the scalar field is complex and the inner 
product is Hermitian. Then for = ^ brfi we consider y = 2* hiVi^d- 
We see that (y, a) = (£, 5 ,»?(&), a ) = 2* &<(»?(&)> a ) = 2< *><&(<*) = <H a )- 



3 | The Representation of a Linear Functional by an Inner Product 187 

Thus rj(<f>) = y = 2* Birth) and r\ is conjugate linear. It should be observed 
that even when r\ is conjugate linear it maps subspaces of V onto subspaces 

of V. 

We describe this situation by saying that we can "represent a linear func- 
tional by an inner product." Notice that although we made use of a particular 
basis to specify the r\ corresponding to <f>, the uniqueness shows that this 
choice is independent of the basis used. If V is a vector space over the real 
numbers, <f> and v\ happen to have the same coordinates. This happy coin- 
cidence allows us to represent V in V and make V do double duty. This fact is 
exploited in courses in vector analysis. In fact, it is customary to start 
immediately with inner products in real vector spaces with orthonormal 
bases and not to mention V at all. All is well as long as things remain simple. 
As soon as things get a little more complicated, it is necessary to separate 
the structure of V superimposed on V. The vectors representing themselves 
in Vare said to be contravariant and the vectors representing linear functional 
in V are said to be covariant. 

We can see from the proof of Theorem 3.1 that, if V is a vector space 
over the complex numbers, <f> and the corresponding r) will not necessarily 
have the same coordinates. In fact, there is no choice of a basis for which 
each <f> and its corresponding r\ will have the same coordinates. 

Let us examine the situation when the basis chosen in V is not orthonormal. 
Let A = {a l5 . . . , aj be any basis of V, and let A = {wi, • • • » V«} be the 
corresponding dual basis of V. Let b it = (a„ a,). Since the inner product is 
Hermitian, b ti = b H , or [b ti ] = B = B*. Since the inner product is positive 
definite, B has rank n. That is, B is non-singular. Let <f> = £r=i c t rp t be an 
arbitrary linear functional in V. What are the coordinates of the correspond- 
ing r\1 Let r\ = ^U VP-i- Tnen 

(V>*i) = (2y<«<> a *j 

n 

= 2 &( a *' a >) 



= 2 vtK 

n 

= 2 c fc Vfc(a*) 



fc=i 
= c, (3-1) 



Thus, we have to solve the equations 

2 Ma = i Mi = c„ j = 1, . . . , n. (3.2) 



i=l i=l 



188 Orthogonal and Unitary Transformations, Normal Matrices | V 

In matrix form this becomes 

BY = C T , 
where 

C = [c x • • • c n ], 
or 

Y = B~ X C* = (CB- 1 )*. (3.3) 

Of course this means that it is rather complicated to obtain the coordinate 
representation of r\ from the coordinate representation of cf>. But that is 
not the cause for all the fuss about covarient and contravariant vectors. 
After all, we have shown that r\ corresponds to (J> independently of the basis 
used and the coordinates of rj transform according to the same rules that 
apply to any other vector in V. The real difficulty stems from the insistence 
upon using (1.9) as the definition of the inner product, instead of using a 
definition not based upon coordinates. 

If V = 2?=i y< a i> and £ = 2"=i x i a i> we see that 



(n n 



= 2 ^Lvi b n x i 
i=i j=i 

= Y*BX. (3.4) 

Thus, if r\ represents the linear functional <f>, we have 

(rj, £) = Y*BX 

= (CB'^BX 

= CX 

= (C*)*X. (3.5) 

Elementary treatments of vector analysis prefer to use C* as the repre- 
sentation of rj. This preference is based on the desire to use (1.9) as the 
definition of the inner product so that (3.5) is the representation of (rj, g), 
rather than to use a coordinate-free definition which would lead to (rj, |) 
being represented by (3.4). The elements of C* are called the covariant 
components of r\. We obtained C by representing </> in V. Since the dual space 
is not available in such an elementary treatment, some kind of artifice must 
be used. It is then customary to introduce a reciprocal basis A* = {a* , . . . , a*}, 
where a* has the property (a*, a,) = d^ = ^(a,). A* is the representation 
of the dual basis A in V. But C was the original representation of <f> in 
terms of the dual basis. Thus, the insistence upon representing linear 
functionals by the inner product does not result in a single computational 
advantage. The confusion that it introduces is a severe price to pay to avoid 
introducing linear functionals and the dual space at the beginning. 



4 | The Adjoint Transformation *°9 

4 I The Adjoint Transformation 

Theorem 4.1. For a given linear transformation a on V, there is a unique 
linear transformation a* on V such that (<r*(a), 0) = (a, o{fi))for all a, /? e V. 

proof. Let a be given. Then for a fixed a, (a, <y(|8)) is a linear function 
of j8, that is, a linear functional on V. By Theorem 3.1 there is a unique // e V 
such that (a, <r(#)) = (??, j8). Define a* (a) to be this 77. 

Now, O^ + « 2 a 2 , (r(/8)) = a^, cr(#)) + a 2 (a 2 , o(p)) = a 1 (a*(a 1 ), /)) + 
a 2 (a*(* 2 ), /?) = (fli<r*(a 1 ) + a 2 a*(a 2 ), /5) so that a 1 ff*(a 1 ) + a 2 cr*(a 2 ) = 
a*{a 1 aL 1 + a 2 a 2 ) and or* is linear. □ 

Since for each <x the choice for <r*(a) is unique, cr* is uniquely defined by a. 
a* is called the adjoint of cr. 

Theorem 4.2. The relation between a and a* is symmetric, that is, 
(o - *)* = a. 

proof. Let a be given. Then cr* is defined uniquely by 0*(a), 0) = 
(a, <r(0)) for all a, e V. Then (cr*)*, which we denote by a**, is defined 
by (<x**(a), 0) = (a, cr*(/S)) for all a, e V. N ow the in ner product is 
Hermitian so that (cr**(a), 0) = (a, <;*(/?)) = (cr*(£), a) = (0, (7(a)) = 
((7(a), 0). Thus cr**(a) = cr(a) for all a e V; that is, cr** = cr. It then 
follows also that (cr(o), p) = (a, cr *(#)). D 

Let ^ = [a w ] be the matrix representing cr with respect to an orthonormal 
basis X = {f f , . . . , |J and let us find the matrix representing cr*. 

(**(£,), &) = (£„ o-(^)) 

i=l 

n 

= (i *«£,,£*)■ (4.D 

Since this equation holds for all | fc , cr*(l ; ) = SU 5,^. Thus cr* is repre- 
sented by the conjugate transpose of A ; that is, a* is represented by A *. 

The adjoint cr* is closely related to the dual a defined on page 142. a is a 
linear transformation of V into itself, so the dual a is a linear transformation 
of V into itself. Since r\ establishes a one-to-one correspondence between 
V and V, we can define a mapping of V into itself corresponding to a on 



190 Orthogonal and Unitary Transformations, Normal Matrices | V 

A. 

V. For any a e V we can map a onto ^{o , [^~ 1 (a)]} and denote this map- 
ping by r](6). Then for any a, /? e V we have 

= irHoOMfl] 

= (a, a(fl) 

= (or*(a), 0). (4.2) 

Hence, ??(cr)(a) = cr*(a) for all a e V, that is, ^(cr) = cr*. The adjoint is a 
representation of the dual. Because the mapping r\ of V onto V is conjugate 
linear instead of linear, and because the vectors in V are represented by row 
matrices while those in V are represented by columns, the matrix representing 
cr* is the transpose of the complex conjugate of the matrix representing a. 
Thus a* is represented by A*. 

We shall maintain the distinction between the dual a defined on V and the 
adjoint a* defined on V. This distinction is not always made and quite often 
both terms are used for both purposes. Actually, this confusion seldom 
causes any trouble. However, it can cause trouble when discussing the matrix 
representation of a or cr*. If cr is represented by A, we have chosen also to 
represent a by A with respect to the dual basis. If we had chosen to represent 
linear functionals by columns instead of rows, a would have been repre- 
sented by A T . It would have been represented by A T in either the real or the 
complex case. But the adjoint cr* is represented by A*. No convention will 
allow a and cr* to be represented by the same matrix in the complex case 
because the mapping r\ is conjugate linear. Because of this we have chosen to 
make clear the distinction between a and cr*, even to the extent of having the 
matrix representations look different. Furthermore, the use of rows to 
represent linear functionals has the advantage of making some of the formulas 
look simpler. However, this is purely a matter of choice and taste, and other 
conventions, used consistently, would serve as well. 

Since we now have a model of V in V, we can carry over into V all the 
terminology and theorems on linear functionals in Chapter IV. In particular, 
we see that an orthonormal basis can also be considered to be its own dual 
basis since (f f , £,) = b ti . 

Recall that, when a basis is changed in V and P is the matrix of transition, 
(pTy-\ i s ^e matrix of transition for the dual bases in V. In mapping V 
onto V, {P T )~ X = (P*)' 1 becomes the matrix of transition for the representa- 
tion of dual basis in V. Since an orthonormal basis is dual to itself, if P is the 
matrix of transition from one orthonormal basis to another, then P must also 
be the matrix of transition for the dual basis; that is, (P*)~ x — P. This 
important property of the matrices of transition from one orthonormal basis 
to another will be established independently in Section 6. 



1Q1 

4 | The Adjoint Transformation x * x 

Let W be a subset of V. In Chapter IV-4, we denned W± to be the anni- 
hilator of W in V. The mapping r\ of V onto V maps W^ onto a subspace 
of V. It is easily seen that ??( W 1 ) is precisely the set of all vectors orthogonal 
to every vector in W. Since we are in the process of dropping V as a separate 
space and identifying it with V, we denote the set of all vectors in orthogonal 
to all vectors in W by V/ 1 - and call it the annihilator of W. 

Theorem 4.3. If W is a subspace of dimension p, W x is of dimension 
n - p . W n W^ = {0}. W@W± = V. 

proof. That WJ- is of dimension n - p follows from Theorem 4.1 of 
Chapter IV. The other two assertions had no meaning in the context of 
Chapter IV. IfaeWnW 1 , then Ha|| 2 = (a, a) = so that a = 0. 
Since dim {W + W^} = dim W + dim W-L - dim {W n W^} = p + 
(„ _ p) = n , W ® W± = V. D 

When Wi and W 2 are subspaces of V such that their sum is direct and W x 
and W 2 are also orthogonal, we use the notation W 1 ± W 2 to denote this sum. 
Actually, the fact that the sum is direct is a consequence of the fact that the 
subspaces are orthogonal. In this notation, the direct sum in the con- 
clusion of Theorem 4.3 takes the form V = W _L W- 1 . 

Theorem 4.4. Let W be a subspace invariant under a. W x is then invariant 

under o*. 
proof. Let a e W ± . Then, for any e W, (<r*(a), 0) = (a, <r(0)) = 

since <r(j8) £ W. Thus cr*(a) e W-L. D 

Theorem 4.5. ^(cr*) = Im(ff) 1 . 

proof. By definition (a, otf)) = (<r*(a), (3). (a, a(/5)) = for all p e V 
if and only if a e Im^ and (<r*(a), /?) = for all £ e V if and only if 
a e #(<r*). Thus tf(or*) = lm(ay. a 

Theorem 4.5 here is equivalent to Theorem 5.3 of Chapter IV. 

Theorem 4.6. IfS and T are subspaces of V, then (S + T) 1 = S^ n T->- 

and(S Hi) 1 =S- L +T 1 . 

proof. This theorem is equivalent to Theorem 4.4 of Chapter IV. □ 

Theorem 4.7. For each conjugate bilinear form f, there is a linear trans- 
formation a such thatf(oi, /S) = (a, o(0))for all a, e V. 

proof. For a fixed a e V, /(a, 0) is linear in /?. Thus by Theorem 3.1 
there is a unique V e V such that /(ex, 0) = (rj, 0) for all /J 6 V. Define 
ff *( a ) _ ^ a * i s linear since (or*(flia 1 + a 2 <x. z ), 0) =f(a 1 a. 1 + a 2 a. 2 , 0) — 
fii/(«i. P) + *tfi"*> P) = ^(°*M> P) + a 2 (a*(* 2 ), 0) = ( ai a*M + 
a (T*(a 2 ), j8). Let <r** = a be the linear transformation of which a* is the 
adjoint 2 'Then/(a, 0) = (a*(a), 0) = (a, <r(/?)). □ 



192 Orthogonal and Unitary Transformations, Normal Matrices | V 

We shall call a the linear transformation associated with the conjugate 
bilinear form /. The eigenvalues and eigenvectors of a conjugate bilinear 
form are defined to be the eigenvalues and eigenvectors of the associated 
linear transformation. Conversely, to each linear transformation a there 
is associated a conjugate bilinear form (a, cr(/?)), and we shall also freely 
transfer terminology in the other direction. Thus a linear transformation 
will be called symmetric, or skew-symmetric, etc., if it is associated with a 
symmetric, or skew-symmetric bilinear form. 

Theorem 4.8. The conjugate bilinear form f and the linear transformation a 
for which /(a, (S) = (a, o((5)) a™ represented by the same matrix with respect 
to an orthonormal basis. 

proof. Let X = {£ l5 . . . , i n } be an orthonormal basis and let A = [a iS ] 
be the matrix representing a with respect to this basis. Then /(!*, £,) = 
(£„ *(£,)) = (£«, 2£=i «mW = 11=k ««(**. £*) = a u . U 

A linear transformation is called self-adjoint if cr* = a. Clearly, a linear 
transformation is self-adjoint if and only if the matrix representing it (with 
respect to an orthonormal basis) is Hermitian. However, by means of 
Theorem 4.7 self-adjointness of a linear transformation can be related to the 
Hermitian character of a conjugate bilinear form without the intervention 
of matrices. Namely, if/ is a Hermitian form then (a* (a), /?) = (a, cr(/?)) = 
/(a, /?) =JW*)= (/?, (7(a)) = ((7(a), 0). 

Theorem 4.9. If a and r are linear transformations on V such that 
(cr(a), j8) = (r(a), /S)/or a// a, /? £ V, f/*e« a = t. 

proof. If (cr(a), 0) - (r(a), 0) = ((<r - r)(a), .0) = for all a, 0, then for 
each a and /? = (<? — r)(a) we have ||(cr — r)(a)|| 2 = 0. Hence, (a — r)(a) = 
for all a and c = t. □ 

Corollary 4.10. If a and r are linear transformations on V such that 
(a, ff(/?)) = (a, r(fi))for all a, ft eV, then a = t. □ 

Theorem 4.9 provides an independent proof that the adjoint operator 
cr* is unique. Corollary 4.10 shows that the linear transformation a corre- 
sponding to the bilinear form /such that/(a, /?) = (a, cr(/S)) is also unique. 
Since, in turn, each linear transformation a defines a bilinear form /by the 
formula /(a, /?) = (a, o"(/?)), this establishes a one-to-one correspondence 
between conjugate bilinear forms and linear transformations. 

Theorem 4.11. Let V be a unitary vector space. If a and r are linear trans- 
formations on V such that (c(a), a) = (t(<x), a) for all a £ V, then a = t. 
proof. It can be checked that 

(cr(a), 0) = |{(cr(a + 0), « + 0) _ (<r(a - ft, a - 0) 

- /(cr(a + //ff), a + #) + /(<r(a - ip), a - /"£)}. (4.3) 



4 | The Adjoint Transformation 193 

It follows from the hypothesis that ((r(oc), j8) = (r(a), 0) for all a, e V. 
Hence, by Theorem 4.9, or = t. □ 

It is curious to note that this theorem can be proved because of the relation 
(4.3), which is analogous to formula (12.4) of Chapter IV. But the analogue 
of formula (10. 1) in the real case does not yield the same conclusion. In fact, 
if V is a vector space over the real numbers and a is skew-symmetric, then 
(o-(oc), a) = (a, <r*(a)) = (a, -<r(a)) = -(a, <x(a)) = -(or (a), a) for all a. 
Thus (o-(a), a) = for all a. In the real case the best analogue of this theorem 
is that if (cr(a), a) = (r(a), a) for all a e V, then a + a* = r + t*, or a and r 
have the same symmetric part. 



EXERCISES 

1. Show that ((Tr)* = T*or*. 

2. Show that if a* a = 0, then a = 0. 

3. Let a be a skew-symmetric linear transformation on a vector space over the 
real numbers. Show that a* = — a. 



4. Let/be a skew-Hermitian form— that is, /(a, 0) = -/(£, a)— and let er be the 
associated skew-Hermitian linear transformation. Show that a* = —a. 

5. Show that eigenvalues of a real skew-symmetric linear transformation are 
either or pure imaginary. Show that the same is true for a skew-Hermitian 
linear transformation. 

6. For what kind of linear transformation a is it true that (I, <r(£)) = for all 
f eV? 

7. For what kind of linear transformation a is it true that <r(f) £ I 1 for all 1 6 V? 

8. Show that if W is an invariant subspace under a, then W- 1 is an invariant 
subspace under a*. 

9. Show that if a is self-adjoint and W is invariant under a, then W-i- is also 
invariant under a. 

10. Let 77 be the projection of V onto S along T. Let tt* be the adjoint of n. 
Show that tt* is the projection of V onto T- 1 - along S x . 

11. Let W = a(V). Show that W- 1 is the kernel of a*. 

12. Show that a and <r* have the same rank. 

13. Let W = o(V). Show that <r*(V) = <r*(W). 

14. Show that <r*(V) = cr*cr(V). Show that a(V) = oo*(V). 

15. Show that if o*o = a*a, then o*(V) = a(V). 

16. Show that if a*a = aa*, then a and o* have the same kernel. 

17. Show that a + a* is self-adjoint. 

18. Show that if o + a* = 0, then <r is skew-symmetric, or skew-Hermitian. 



194 Orthogonal and Unitary Transformations, Normal Matrices | V 

19. Show that a - a* is skew-symmetric, or skew-Hermitian. 

20. Show that every linear transformation is the sum of a self-adjoint trans- 
formation and a skew-Hermitian transformation. 

21. Show that if aa* = a* a, then Im(ff) is an invariant subspace under a. In 
fact, show that a n (V) = o(V) for all n > 1. 

22. Show that if o is a scalar transformation, that is a(a) = aa, then cr*(a) = aa. 



5 I Orthogonal and Unitary Transformations 

Definition. A linear transformation of V into itself is called an isometry if 
it preserves length; that is, a is an isometry if and only if ||cr(a)|| = ||a|j for 
all a g V. An isometry in a vector space over the real numbers is called an 
orthogonal transformation. An isometry in a vector space over the complex 
numbers is called a unitary transformation. We try to save duplication and 
repetition by treating the real and complex cases together whenever possible. 

Theorem 5.1. A linear transformation a of V into itself is an isometry if 
and only if it preserves the inner product; that is, if and only if (a, /?) = 
(<r(a), o{P))for all a, fi e V. 

proof. Certainly, if a preserves the inner product then it preserves 
length since ||a(a)|| 2 = (<r(a), <r(a)) = (a, a) = ||a|| 2 . 

The converse requires the separation of the real and complex cases. For 
an inner product over the real numbers we have 

(a, ft = K(a + t a + 0) _ (a, a ) _ (fa £)} 

= *{||a + £|| 2 - l|a|| 2 - ||£|| 2 }. (5.1) 

For an inner product over the complex numbers we have 

(a, ft = i{(a + 0, a + /8) - (a - 0, a - 0) 

- /(a + ifi, a + i/8) + /(a - #, a - ifi)} 

= idla + £|| 2 - || a - £|| 2 - / || a + i/8||» + / ||a - #f}. (5.2) 

In either case, any linear transformation which preserves length will preserve 
the inner product. □ 

Theorem 5.2. A linear transformation a of V into itself is an isometry if 
and only if it maps an orthonormal basis onto an orthonormal basis. 

proof. It follows immediately from Theorem 5.1 that if a is an isometry, 
then a maps every orthonormal set onto an orthonormal set and, therefore, 
an orthonormal basis onto an orthonormal basis. 

On the other hand, let X = {f l9 . . . , f n } be any orthonormal basis which 
is mapped by a onto an orthonormal basis {c(^ 1 ), . . . , <r(£ n )}. For an 



6 | Orthogonal and Unitary Matrices 195 

arbitrary vector a £ V, a = 2?=i x i%i> we nave 
Ka)|| 2 = (cr(a),<r(a)) 

i=l 3=1 
n n 

i=l 3=1 

= i^=H| 2 . (5.3) 

Thus a preserves length and it is an isometry. □ 

Theorem 5.3. a is an isometry if and only if a* = (f 1 . 

proof. If a is an isometry, then 0(a), cr(#)) = (a, /S) for all a, ft e V. By 
the definition of <r*, (a, 0) = (<r*[<r(a)], #) = (<r*<r(a), #)• Since this 
equation holds for all /9 e V, c*or(a) is uniquely defined and a*a(jx) = a. Thus 
a* a is the identity transformation, that is, a* = cr 1 . 

Conversely, suppose that a* = a- 1 . Then (<r(a), a(fi)) = (<r*[<r(a)], /S) = 
(<r*<r(a), /3) = (a, fi) for all a, e V, and a is an isometry. □ 

EXERCISES 

1. Let a be an isometry and let A be an eigenvalue of a. Show that |A| = 1. 

2. Show that the real eigenvalues of an isometry are ±1. 

3. Let X = {f l5 | 2 } be an orthonormal basis of V. Find an isometry that maps 

1 
£x onto -^= (fi + f 2 ). 

4. Let X = {fi, l 2 » ^3} be an orthonormal basis of V. Find an isometry that 
maps fi onto J(f x + 2I 2 + 2^)- 

6 I Orthogonal and Unitary Matrices 

Let a be an isometry and let U = [i/ w ] be a matrix representing a with 
respect to an orthonormal basis X = {k t , . . . , IJ. Since a is an isometry, 
the set X' = {a(^), . . . , <r(£J} must also be orthonormal. Thus 

*-=i «=i / 

n / n \ 

n 



196 Orthogonal and Unitary Transformations, Normal Matrices | V 

This is equivalent to the matrix equation U*U = I, which also follows 
from Theorem 5.3. 

It is also easily seen that if U*U = I, then a must map an orthonormal 
basis onto an orthonormal basis. By Theorem 5.2 a is then an isometry. 
Thus, 

Theorem 6.1. A matrix U whose elements are complex numbers represents 
a unitary transformation {with respect to an orthonormal basis) if and only if 
U* = U l . A matrix with this property is called a unitary matrix. □ 

If the underlying field of scalars is the real numbers instead of the complex 
numbers, then U is real and U* = U T . Nothing else is really changed and 
we have the corresponding theorem for vector spaces over the real numbers. 

Theorem 6.2. A matrix U whose elements are real numbers represents an 
orthogonal transformation (with respect to an orthonormal basis) if and only 
if JJ T = U~ x . A real matrix with this property is called an orthogonal 
matrix. □ 

As is the case in Theorems 6.1 and 6.2, quite a bit of the discussion of 
unitary and orthogonal transformations and matrices is entirely parallel. 
To avoid unnecessary duplication we discuss unitary transformations and 
matrices and leave the parallel discussion for orthogonal transformations 
and matrices implicit. Up to a certain point, an orthogonal matrix can be 
considered to be a unitary matrix that happens to have real entries. This 
viewpoint is not quite valid because a unitary matrix with real coefficients 
represents a unitary transformation, an isometry on a vector space over 
the complex numbers. This viewpoint, however, leads to no trouble until 
we make use of the algebraic closure of the complex numbers, the property 
of complex numbers that every polynomial equation with complex co- 
efficients possesses at least one complex solution. 

It is customary to read equations (6.1) as saying that the columns of U are 
orthonormal. Conversely, if the columns of U are orthonormal, then 
U* = U" 1 and U is unitary. Also, U* as a left inverse is also a right inverse ; 
that is, UU* = /. Thus, 

n n 

2 "*fc"7fc = d a = 2 uJkUik- (6-2) 

fc=l k=l 

Thus U is unitary if and only if the rows of U are orthonormal. Hence, 

Theorem 6.3. Unitary and orthogonal matrixes are characterized by the 
property that their columns are orthonormal. They are equally characterized 
by the property that their rows are orthonormal. □ 

Theorem 6.4. The product of unitary matrices is unitary. The product of 
orthogonal matrices is orthogonal. 



6 | Orthogonal and Unitary Matrices 197 

proof. This follows immediately from the observation that unitary and 
orthogonal matrices represent isometries, and one isometry followed by 
another results in an isometry. □ 

A proof of Theorem 6.4 based on the characterizing property U* = U~ l 
(or U T — JJ- 1 for orthogonal matrices) is just as brief. Namely, (1/^)* = 

u*u? = u-w^ = (u x u 2 )-k 

Now suppose that X = {f l9 ...,!„} and X' = {£[, . . . , g£ are two 
orthonormal bases, and that P = [p tj ] is the matrix of transition from the 
basis X to the basis X'. By definition, 

i; = JU^. (6-3) 

Thus, 



\t=l s=l 



n to 



= 2 Pa 2 P S fc(^> f «) 

«=1 s=l 

TO 

This means the columns of P are orthonormal and P is unitary (or orthogonal). 
Thus we have 

Theorem 6.5. The matrix of transition from one orthonormal basis to 
another is unitary (or orthogonal if the underlying field is real). □ 

We have seen that two matrices representing the same linear transformation 
with respect to different bases are similar. If the two bases are both ortho- 
normal, then the matrix of transition is unitary (or orthogonal). In this case 
we say that the two matrices are unitary similar (or orthogonal similar). 
The matrices A and A' are unitary (orthogonal) similar if and only if there 
exists a unitary (orthogonal) matrix P such that A' = P~ X AP = P*AP 
(A' = P-^AP = P T AP). 

If H and H' are matrices representing the same conjugate bilinear form 
with respect to different bases, they are Hermitian congruent and there 
exists a non-singular matrix P such that H' = P*HP. P is the matrix of 
transition and, if the two bases are orthonormal, P is unitary. Then H' = 
P*HP = P~ X HP. Hence, if we restrict our attention to orthonormal bases 
in vector spaces over the complex numbers, we see that matrices representing 
linear transformations and matrices representing conjugate bilinear forms 
transform according to the same rules ; they are unitary similar. 



198 



Orthogonal and Unitary Transformations, Normal Matrices | V 



If B and B' are matrices representing the same real bilinear form with 
respect to different bases, they are congruent and there exists a non-singular 
matrix P such that B' = P T BP. P is the matrix of transition and, if the two 
bases are orthonormal, P is orthogonal. Then B' = P T BP = P _1 BP. 
Hence, if we restrict our attention to orthonormal bases in vector spaces 
over the real numbers, we see that matrices representing linear transforma- 
tions and matrices representing bilinear forms transform according to the 
same rules; they are orthogonal similar. 

In our earlier discussions of similarity we sought bases with respect to 
which the representing matrix had a simple form, usually a diagonal form. 
We were not always successful in obtaining a diagonal form. Now we 
restrict the set of possible bases even further by demanding that they be 
orthonormal. But we can also restrict our attention to the set of matrices 
which are unitary (or orthogonal) similar to diagonal matrices. It is fortunate 
that this restricted class of matrices includes a rather wide range of cases 
occurring in some of the most important applications of matrices. The main 
goal of this chapter is to define and characterize the class of matrices unitary 
similar to diagonal matrices and to organize computational procedures by 
means of which these diagonal matrices and the necessary matrices of 
transition can be obtained. We also discuss the special cases so important 
in the applications of the theory of matrices. 



EXERCISES 

1. Test the following matrices for orthogonality. If a matrix is orthogonal, 
find its inverse. 



1 

2 


V3 
2 


(b) 


1 

2 


V3 
2 


(c) 


"0.6 


0.8" 


-V3 


1 




V3 


1 




_0.8 


-0.6 


2 


2 




2 


2 









2. Which of the following matrices are unitary ? 



(a) 



1 +1 


1 -r 








2 
1 -1 


2 
1 +/ 


(b) 


"1 f 
J 1- 


(c) 


"1 

J 


—1 
1_ 


2 


2 _ 









3. Find an orthogonal matrix with (1/^2, 1/^2) in the first column. Find an 
orthogonal matrix with (£, f , §) in the first column. 

4. Find a symmetric orthogonal matrix with (J, |, §) in the first column. Com- 
pute its square. 



7 | Superdiagonal Form 



199 



5. The following matrices are all orthogonal. Describe the geometric effects 
in real Euclidean 3-space of the linear transformations they represent. 

(«) 



id) 



1 





0" 


(b) 


~-l 





0" 


(c) 


"-1 





0" 





1 










-1 










-1 











-1 










1 










-1 



"cos —sin 0" 
sin cos 



(e) 



cos —sin 0" 
sin cos 



1J |_ 0-1 

Show that these five matrices, together with the identity matrix, each have different 
eigenvalues (provided is not 0° or 180°), and that the eigenvalues of any third- 
order orthogonal matrix must be one of these six cases. 

6. If a matrix represents a rotation of R 2 around the origin through an angle of 
0, then it has the form 

"cos —sin 0" 

sin cos 



A(8) = 



Show that A{d) is orthogonal. Knowing that A{0) ■ A{ip) = A{B + y>), prove that 
sin (0 + y) = sin cos y> + cos sin y>. Show that if U is an orthogonal 2x2 
matrix, then U- X A{B)U = A(±6). 

7. Find the matrix B representing the real quadratic form q{x, y) = ax 2 + 
2bxy + cy 2 . Show that the discriminant D = ac — b 2 is the determinant of B. 
Show that the discriminant is invariant under orthogonal coordinate changes, 
that is, changes of coordinates for which the matrix of transition is orthogonal. 



7 I Superdiagonal Form 

In this section we restrict our attention to vector spaces (and to matrices) 
over the field of complex numbers. We have already observed that not 
every matrix is similar to a diagonal matrix. Thus, it is also true that not 
every matrix is unitary similar to a diagonal matrix. We later restrict our 
attention to a class of matrices which are unitary similar to diagonal matrices. 
As an intermediate step we obtain a relatively simple form to which every 
matrix can be reduced by unitary similar transformations. 

Theorem 2.1. Let a be any linear transformation of V, a finite dimensional 
vector space over the complex numbers, into itself There exists an ortho- 
normal basis of V with respect to which the matrix representing a is in super- 
diagonal form; that is, every element below the main diagonal is zero. 

proof. The proof is by induction on n, the dimension of V. The theorem 
says there is an orthonormal basis Y = {rj 1 , . . . , r} n } such that a(r] k ) = 
S=i fl ifc*7*' tne important property being that the summation ends with the 
fcth term. The theorem is certainly true for n = 1 . 



200 Orthogonal and Unitary Transformations, Normal Matrices | V 

Assume the theorem is true for vector spaces of dimensions <«. Since 
V is a vector space over the complex numbers, a has at least one eigenvalue. 
Let X x be an eigenvalue for a and let £[ 5^ 0, \\^\\ = 1, be a corresponding 
eigenvector. There exists a basis, and hence an orthonormal basis, with 
£J as the first element. Let the basis be X' = {£{, ...,£,} and let W be the 
subspace spanned by {£' 2 , . . . , |^}. Wis the subspace consisting of all vectors 
orthogonal to g[. For each a = 2*=i a^' define r(a) = 2? = 2 fl <£< e W - 
Then to 1 restricted to W is a linear transformation of W into itself. According 
to the induction assumption, there is an orthonormal basis {rj 2 , . . . , rj n } 
of W such that for each rj k , ra(r) k ) is expressible in terms of {rj 2 , . . . , t] k } 
alone. We see from the way r is defined that a(r] k ) is expressible in terms of 
{!{, r) 2 , . . . , rj k } alone. Let r\ x = l' v Then V = {rj lt rj 2 , . . . , ^J is the re- 
quired basis. 

alternate proof. The proof just given was designed to avoid use of the 
concept of adjoint introduced in Section 4. Using that concept, a very much 
simpler proof can be given. This proof also proceeds by induction on n. The 
assertion for n = 1 is established in the same way as in the first proof given. 
Assume the theorem is true for vector spaces of dimension <«. Since V is a 
vector space over the complex numbers, a* has at least one eigenvalue. Let X n 
be an eigenvalue for a* and let rj n , \\rj n \\ = 1, be a corresponding eigenvector. 
Then by Theorem 4.4, W = (rjn) 1 - is an invariant subspace under a. Since 
rj n 7* 0, W is of dimension n — 1. According to the induction assumption, 
there is an orthonormal basis {rj x , . . . , ^ n _i} of W such that o(r) k ) = 
2=i ^kVi* for fc = 1, 2, . . . , n - 1. However, {^, . . . , rj n } is also an 
orthonormal basis of U and o(rj n ) = ^ =1 a^v}^ for k = 1 , . . . , n. □ 

Corollary 7.2. Over the field of complex numbers, every matrix is unitary 
similar to a superdiagonal matrix. □ 

Theorem 7.1 and Corollary 7.2 depend critically on the assumption that 
the field of scalars is the field of complex numbers. The essential feature 
of this condition is that it guarantees the existence of eigenvalues and eigen- 
vectors. If the field of scalars is not algebraically closed, the theorem is 
simply not true. 

Corollary 7.3. The diagonal terms of the superdiagonal matrix representing 
a are the eigenvalues of a. 

proof. If A = [a u ] is in superdiagonal form, then the characteristic 
polynomial is (a n — x)(a 22 — x) • • - (a nn — x). a 



EXERCISES 

1. Let a be a linear transformation mapping U into V. Let A be any basis of U 
whatever. Show that there is an orthonormal basis 8 of V such that the matrix 



8 | Normal Matrices 201 

representing a with respect to A and 8 is in superdiagonal form. (In this case 
where U and V need not be of the same dimension so that the matrix representing 
a need not be square, by superdiagonal form we mean that all elements below the 
main diagonal are zeros.) 

2. Let a be a linear transformation on V and let V = {^ l5 . . . , t] n } be an ortho- 
normal basis such that the matrix representing a with respect to V is in super- 
diagonal form. Show that the matrix representing a* with respect to V is in sub- 
diagonal form; that is, all elements above the main diagonal are zeros. 

3. Let or be a linear transformation on V. Show that there is an orthonormal 
basis Y of V such that the matrix representing a with respect to Y is in subdiagonal 
form. 



8 I Normal Matrices 

It is possible to give a necessary and sufficient condition that a matrix be 
unitary similar to a diagonal matrix. The real value in establishing this 
condition is that several important types of matrices do satisfy this condition. 

Theorem 8.1. A matrix A in superdiagonal form is a diagonal matrix if 
and only if A* A = AA*. 

proof. Let A = [a w ] where a ti = if / > j. Suppose that A* A = AA*. 
This means, in particular, that 

n n 

X fi « fl « = lMii- ( 8 -l) 

But since a i} = for / > j, this reduces to 

i n 

l\a H \ 2 =2\a ik \ 2 . (8.2) 

j'=l k=i 

Now, if A were not a diagonal matrix, there would be a first index / for 
which there exists an index k > i such that a ik ¥" 0. For this choice of the 
index / the sum on the left in (8.2) reduces to one term while the sum on the 
right contains at least two non-zero terms. Thus, 

t\a H \ 2 =\a H \ 2 = l\a ik \\ (8.3) 

which is a contradiction. Thus A must be a diagonal matrix. 

Conversely, if A is a diagonal matrix, then clearly A* A = AA*. n 

A matrix A for which A* A = AA * is called a normal matrix. 

Theorem 8.2. A matrix is unitary similar to a diagonal matrix if and only 
if it is normal. 



202 



Orthogonal and Unitary Transformations, Normal Matrices | V 



proof. If A is a normal matrix, then any matrix unitary similar to A is 
also normal. Namely, if U is unitary, then 

(U*AU)*(U*AU) = U*A*UU*AU 
= U*A*AU 
= U*AA*U 
= U*AUU*A*U. 
= (U*AU)(U*AU)*. (8.4) 

Thus, if A is normal, the superdiagonal form to which it is unitary similar 
is also normal and, hence, diagonal. Conversely, if A is unitary similar to 
a diagonal matrix, it is unitary similar to a normal matrix and it is therefore 
normal itself. □ 

Theorem 8.3. Unitary matrices and Hermitian matrices are normal. 

proof. If U is unitary then U*U = U~ X U = UU- 1 = UU*. If H is 
Hermitian then H*H = HH = HH*. □ 



EXERCISES 

1. Determine which of the following matrices are orthogonal, unitary, symmetric 
Hermitian, skew-symmetric, skew-Hermitian, or normal. 

(a) 
(d) 
(g) 



0') 



"1 - 


2" 


(b) 


"1 


i 




(c) f 


"1 


f 






2 1 




i 


1 




i 


2_ 




"o r 




(e) 


"0 


-r 


(/) 


" 


1 1 - f 


1 o_ 






1 







1 


+ / 3 


"1 -2 


2" 




(h) 


"1 2 


2" 


(0 


i 2 2 


1 




i 

3 


2 -2 


1 




2 -1 


-2_ 






2 1 


-2_ 




"i r 




(*) 


"0 


-l -r 






1 






1 


-1 












1 




i 


















-1 



3 



-2" 
-3 




2. Which of the matrices of Exercise 1 are unitary similar to diagonal matrices? 

3. Show that a real skew-symmetric matrix is normal. 

4. Show that a skew-Hermitian matrix is normal. 

5. Show by example that there is a skew-symmetric complex matrix which is 
not normal. 

6. Show by example that there is a symmetric complex matrix which is not 
normal. 

7. Find an example of a normal matrix which is not Hermitian or unitary. 

8. Show that if M = A + Bi where A and B are real and symmetric, then M 
is normal if and only if A and B commute. 



9 | Normal Linear Transformations 203 

9 I Normal Linear Transformations 

Theorem 9.1. If there exists an orthonormal basis consisting of eigen- 
vectors of a linear transformation a, then a* a = aa*. 

proof. Let X = {£ l5 . . . , £J be an orthonormal basis consisting of 
eigenvectors of a. Let X t be the eigenvalue corresponding to ^. Then. 

(**(£,), *,) = (*„ *(£,)) = (*„ A,|,) = V«y = ^» = *<(£<» W = <¥<' **)• 
For a fixed |< this equation holds for all | 3 and, hence, (or*(£j), a) = (!<£„ a) 
for all a e V. This means <r*(f <) = !<£ < and £< is an eigenvector of <r* with 
eigenvalue I,. Then <ra*(f < ) = trfof*) = I<A,f, = <r*tf(£<)- Since aa* = 
a*a on a basis of V, aa* = a*a on all of V. □ 

A linear transformation a for which a* a = aa* is called a normal linear 
transformation. Clearly, a linear transformation is normal if and only if the 
matrix representing it (with respect to an orthonormal basis) is normal. 

In the proof of Theorem 9.1 the critical step is showing that an eigenvector 
of a is also an eigenvector of a*. The converse is also true. 

Theorem 9.2. If | is an eigenvector of a normal linear transformation a 
corresponding to the eigenvalue A, then £ is an eigenvector of a* corresponding 
to I. 

proof. Since a is normal (cr(£), cr(£)) = (<t*<t(£)> f) = ( ffff *(f)> f) = 
(ff*(£), <**(£))> Since £ is an eigenvector of a corresponding to A, <r(£) = A£ 
so that 

= ||<r(£) - A£|| 2 = (<r(£) - A*, <;(£) - A!) 

= (<r(£), <r(£)) - A(£, <r(£)) - A(or(D, £) + lA(|, £) 

= (<r*(£), **(0) - A(or*(|), £) - A(£, a*(£)) + AA(|, £) 

= (<r*(£) - A£, <r*(£) - A£) 

= ||cr*(£)-A£|| 2 . (9.1) 

Thus o*(£) - A£ = 0, or <r*(£) = A£. n 

Theorem 9.3. For a normal linear transformation, eigenvectors corre- 
sponding to different eigenvalues are orthogonal. 

proof. Suppose a{i x ) = A^ and a(£ 2 ) = ^£2 where A x 5^ A 2 . Then 
A 2 (li, £ 2 ) = (£1, A 2 £ 2 ) = (£ l5 <r(£ 2 )) = (*•(£,), £ 2 ) = (A^, £ 2 ) = A^, £ 2 ). 
Thus (A x - A 2 )(£ l5 £ 2 ) = 0. Since l x - A 2 5* we see that (£ l5 £ 2 ) = 0; 
that is, £ x and £ 2 are orthogonal. □ 

Theorem 9.4. If a is normal, then (a(a), a(j8)) = (a*(a), a*(]8)) /or all 
a, e V. 
PROOF. (cr(a), <r(0)) = (a*a(a), |8) = (orcr*(a), 0) = (a*(a), <r*(0)). D 

Corollary 9.5. If a is normal, \\a(a.)\\ = ||<r*(a))|| /or a// a e V. O 



204 Orthogonal and Unitary Transformations, Normal Matrices | V 

Theorem 9.6. If ((7(a), a(fi)) = ((7*(a), <r*(/3)) for all aJeV, then a is 
normal. 

proof, (a, ac7*03)) = (ff*(a), (7*(/3)) = (or(a), <r(0)) = (a, o*o(P)) for all 
aJeV. By Corollary 4.10, (7(7* = or*(T and (7 is normal. □ 

Theorem 9.7. If \\a(<x)\\ = ||<r*(a)|| /or a// a g V, then a is normal. 
proof. We must divide this proof into two cases : 

1. V is a vector space over F, a subfield of the real numbers. Then 

((7(a), <r(fl) = i{ll*(« + £)U 2 - lk(a - /8)||»}. 

It then follows from the hypothesis that (<r(a), <x(j0)) = (<r*(a), <r*(0)) for all 
a, /3 G V, and c is normal. 

2. V is a vector space over F, a non-real normal subfield of the complex 
numbers. Let a e F be chosen so that a ^ a. Then 

((7(a), (7(/S)) = * 

2(a — a) 

X {a ||(7(a + 0)|| 2 - a Ka - £)|| 2 - ||(7(a + a£)|| 2 + ||(7(a - a£)|| 2 }. 

Again, it follows that a is normal. □ 

Theorem 9.8. If a is normal then K(o) = K(o*). 

proof. Since ||<y(a)|| = ||<r*(a)||, (7(a) = if and only if (7*(a) = 0. D 

Theorem 9.9. If a is normal, K{a) = lm(o) L . 

proof. By Theorem 4.5 K(o*) = Im((7) x , and by Theorem 9.8 K(a) — 
K(o*). D 

Theorem 9.10. If a is normal, Im a = Im a*. 
proof. Im (7 = ^(a)- 1 = Im a*. O 

Theorem 9.11. If a is a normal linear transformation and W is a set of 
eigenvectors of a, then W L is an invariant subspace under a. 

proof, a g W- 1 if and only if (£, a) = for all £ e W. But then (|, cr(a)) = 
(<r*(|), a) = (A|, a) = A(£, a) = 0. Hence, (7(a) e W x and W- 1 is invariant 
under cr. D 

Notice it is not necessary that W be a subspace, it is not necessary that 
W contain all the eigenvectors corresponding to any particular eigenvalue, 
and it is not necessary that the eigenvectors in W correspond to the same 
eigenvalue. In particular, if I is an eigenvector of a, then {£}■>- is an invariant 
subspace under a. 

Theorem 9.12. Let V be a vector space with an inner product, and let a be a 
normal linear transformation of V into itself. If W is a subspace which is 
invariant under both a and g*, then a is normal on W. 



9 | Normal Linear Transformations 205 

proof. Let a denote the linear transformation of W into itself induced 
by a. Let o* denote the adjoint of a on W. Then for all, a, /? e W we have 

(or*(a), 0) = (a, a0)) = (a, cr(0)) = (or* (a), /?). 

Since ((a* — cr*)(a), /3) = for all a, /S £ W, a* and or* coincide on W. 
Thus ff*cr = a*a = era* = oq* on W, and <r is normal. □ 

Theorem 9.13. Let V be a finite dimensional vector space over the complex 
numbers, and let a be a normal linear transformation on V. If W is invariant 
under a, then W is invariant under a* and a is normal on W. 

proof. By Theorem 4.4, W 1 - is invariant under a*. Let a* be the 
restriction of a* with W 1 as domain and codomain. Since W- is also a finite 
dimensional vector space over the complex numbers, a* has at least one 
eigenvalue A and corresponding to it a non-zero eigenvector £. Thus o"*(|) = 
A! = o-*(|). Thus, we have found an eigenvector for a* in W 1 . 

Now proceed by induction. The theorem is certainly true for spaces of 
dimension 1. Assume the theorem holds for vector spaces of dimension <«. 
By Theorem 9.2, £ is an eigenvector of a. By Theorem 9.1 1 , (I)- 1 is invariant 
under both a and a*. By Theorem 9.12, a is normal on (I) 1 . Since (!) <=■ 
W-L, W c= (I)- 1 . Since dim (I)- 1 = n — 1 , the induction assumption applies. 
Hence, a is normal on W and W is invariant under a*. □ 

Theorem 9.13 is also true for a vector space over any subfield of the complex 
numbers, but the proof is not particularly instructive and this more general 
form of Theorem 9.13 will not be needed later. 

We should like to obtain a converse of Theorem 9.1 and show that a normal 
linear transformation has enough eigenvectors to make up an orthonormal 
basis. Such a theorem requires some condition to guarantee the existence 
of eigenvalues or eigenvectors. One of the most important general conditions 
is to assume we are dealing with vector spaces over the complex numbers. 

Theorem 9.14. If V is a finite dimensional vector space over the complex 
numbers and a is a normal linear transformation, then V has an orthonormal 
basis consisting of eigenvectors of a. 

proof. Let n be the dimension of V. The theorem is certainly true for 
n = 1, for if {fi} is a basis a{^ = a^ x . 

Assume the theorem holds for vector spaces of dimension <«. Since V 
is a finite dimensional vector space over the complex numbers, a has at 
least one eigenvalue A l5 and corresponding to it a non-zero eigenvector £ x 
which we can take to be normalized. By Theorem 9.1 1 , {li}- 1 is an invariant 
subspace under a. This means that a acts like a linear transformation on 
{£i} ± when we confine our attention to {^i}- 1 . But then a is also normal on 



206 Orthogonal and Unitary Transformations, Normal Matrices | V 

{it} 1 . Since l^} 1 - is of dimension n — 1, our induction assumption applies 
and {fx} x has an orthonormal basis {£ 2 > • • • , f J consisting of eigenvectors 
of a. {£j, | a , . . . , !J is the required orthonormal basis of V consisting of 
eigenvectors of a. □ 

We can observe from examining the proof of Theorem 9.1 that the con- 
clusion that a and o* commute followed immediately after we showed that 
the eigenvectors of a were also eigenvectors of cr*. Thus the following 
theorem follows immediately. 

Theorem 9.15. If there exists a basis (orthonormal or not) consisting of 
vectors which are eigenvectors for both a and r, then ot = to. □ 

Any possible converse to Theorem 9.15 requires some condition to ensure 
the existence of the necessary eigenvectors. In the following theorem we 
accomplish this by assuming that the field of scalars is the field of complex 
numbers, any set of conditions that would imply the existence of the eigen- 
vectors could be substituted. 

Theorem 9.16. Let V be a finite dimensional vector space over the complex 
numbers and let a and r be normal linear transformations on V. If or = to, 
then there exists an orthonormal basis consisting of vectors which are eigen- 
vectors for both o and r. 

proof. Suppose or = to. Let A be an eigenvalue of o and let S(A) be 
the eigenspace of o consisting of all eigenvectors of o corresponding to A. 
Then for each I e S(A) we have crr(£) = to(£) = t(A£) = At(£). Hence, 
t(£) e S(A). This shows that S(A) is an invariant subspace under t; that is, 
t confined to S(A) can be considered to be a normal linear transformation of 
S(X) into itself. By Theorem 9.14 there is an orthonormal basis of S(A) con- 
sisting of eigenvectors of r. Being in S(A) they are also eigenvectors of a. 
By Theorem 9.3 the basis vectors obtained in this way in eigenspaces 
corresponding to different eigenvalues of o are orthogonal. Again, by 
Theorem 9.14 there is a basis of V consisting of eigenvectors of o. This 
implies that the eigenspaces of o span V and, hence, the entire orthonormal 
set obtained in this fashion is an orthonormal basis of V. □ 

As we have seen, self-adjoint linear transformations and isometries are 
particular cases of normal linear transformations. They can also be char- 
acterized by the nature of their eigenvalues. 

Theorem 9.17. Let V be a finite dimensional vector space over the complex 
numbers. A normal linear transformation o on V is self-adjoint if and only 
if all its eigenvalues are real. 



9 | Normal Linear Transformations 207 

proof. Suppose a is self-adjoint. Let A be an eigenvalue for a and let | 
be an eigenvector corresponding to A. Then ||cr(£)|| 2 = (<?(£), °"(£)) = 
(a*(£), ff(£)) = A 2 || HI 2 . Thus A 2 is real non-negative and A is real. 

On the other hand, suppose a is a normal linear transformation and that 
all its eigenvalues are real. Since a is normal there exists a basis X = 
{£ l5 . . . , £ n } of eigenvectors of <r. Let A i be the eigenvalue corresponding 
to |j. Then c*(| J ) = A^ = A^ = cr(£,). Since <7* coincides with a on 
a basis of V, a = o - * on all of V. □ 

Theorem 9.18. Let V be a finite dimensional vector space over the complex 
numbers. A normal linear transformation a on V is an isometry if and only 
if all its eigenvalues are of absolute value 1. 

proof. Suppose a is an isometry. Let A be an eigenvalue of a and let £ 
be an eigenvector corresponding to A. Then ||£|| 2 = ||or(|)|| 2 = (<r(£), 
<r(D) = (A£, A|) = |A| 2 (|, |). Hence |A| 2 = 1. 

On the other hand suppose a is a normal linear transformation and that 
all its eigenvalues are of absolute value 1 . Since a is normal there exists a 
basis X = {£ 1} ...,£„} of eigenvectors of c. Let A t be the eigenvalue 
corresponding to f<. Then (cr^), (7(1,-)) = (A^-, A,-£ 3 -) = A^-C^, |,) = <5 i; . 
Hence, a maps an orthonormal basis onto an orthonormal basis and it is 
an isometry. □ 

EXERCISES 

1. Prove Theorem 9.2 directly from Corollary 9.5. 

2. Show that if there exists an orthonormal basis such that a and t are both 
represented by diagonal matrices, then ot = to. 

3. Show that if a and r are normal linear transformations such that ar = to, 
then there is an orthonormal basis of V such that the matrices representing a and 
t are both diagonal; that is, a and t can be diagonalized simultaneously. 

4. Show that the linear transformation associated with a Hermitian form is 
self-adjoint. 

5. Let /be a Hermitian form and let a be the associated linear transformation. 
Let X = {£ x , . . . , £„} be a basis of eigenvectors of a (show that such a basis 
exists) and let {A l5 . . . , A n } be the corresponding eigenvalues. Let a = 2?=i a i£* 
and =2? =1 bi£i be arbitrary vectors in V. Show that /(a, #) = ^7=1 <*AV 

6. (Continuation) Let ^ be the Hermitian quadratic form associated with the 
Hermitian form/. Let S be the set of all unit vectors in V; that is, a e S if and only 
if || a || =1. Show that the maximum value of ^(a) for a G S is the maximum eigen- 
value, and the minimum value of ^(a) for a e S is the minimum eigenvalue. Show 
that q(<x) # for all non-zero a E V if all the eigenvalues of/ are non-zero and of 
the same sign. 



208 Orthogonal and Unitary Transformations, Normal Matrices | V 

7. Let ff be a normal linear transformation and let {A x , . . . , A fc } be the distinct 
eigenvalues of a. Let M* be the subspace of eigenvectors of a corresponding to X t . 
Show that V = M x _L ■■•_]_ M*. 

8. (Continuation) Let ^ be the projection of V onto M* along M^. Show that 

1 = tt x + ■ ■ • + TT k . Show that a = X 1 tt 1 + ■ ■ ■ + X k ir k . Show that a r = V*a + 
• • • + X k r iz k . Show that if p{x) is a polynomial, then/?(<r) = 2* =1 p{K)^i- 



10 I Hermitian and Unitary Matrices 

Although all the results we state in this section have already been obtained, 
they are sufficiently useful to deserve being summarized separately. In this 
section we are considering matrices whose entries are complex numbers. 

Theorem 10.1. If H is Hermitian, then 

(1) H is unitary similar to a diagonal matrix D. 

(2) The elements along the main diagonal of D are the eigenvalues of H. 

(3) The eigenvalues of H are real. 

Conversely, ifH is normal and all its eigenvalues are real, then H is Hermitian. 

proof. We have already observed that a Hermitian matrix is normal 
so that (1) and (2) follow immediately. Since D is diagonal and Hermitian, 
D = D* = D and the eigenvalues are real. 

Conversely, if H is a normal matrix with real eigenvalues, then the diagonal 
form to which it is unitary similar must be real and hence Hermitian. Thus 
H itself must be Hermitian. □ 

Theorem 10.2. If A is unitary, then 

(1) A is unitary similar to a diagonal matrix D. 

(2) The elements along the main diagonal of D are the eigenvalues of A. 

(3) The eigenvalues of A are of absolute value 1. 

Conversely, if A is normal and all its eigenvalues are of absolute value 1 , 
then A is unitary. 

proof. We have already observed that a unitary matrix is normal so that 
(1) and (2) follow immediately. Since D is also unitary, DD = D*D = I 
so that |AJ 2 = X t Xi = 1 for each eigenvalue X t . 

Conversely, if A is a normal matrix with eigenvalues of absolute value 1, 
then from the diagonal form D we have D*D = DD = / so that D and A 
are unitary. □ 

Corollary 10.3. If A is orthogonal, then 

(1) A is unitary similar to a diagonal matrix D. 

(2) The elements along the main diagonal of D are the eigenvalues of A. 

(3) The eigenvalues of A are of absolute value 1. □ 



1 1 I Real Vector Spaces 



209 



This is a conventional statement of this corollary and in this form it is 
somewhat misleading. If A is a unitary matrix that happens to be real, 
then this corollary says nothing that is not contained in Theorem 10.2. 
A little more information about A and its eigenvalues is readily available. 
For example, the characteristic equation is real so that the eigenvalues occur 
in conjugate pairs. An orthogonal matrix of odd order has at least one 
real eigenvalue, etc. If A is really an orthogonal matrix, representing an 
isometry in a vector space over the real numbers, then the unitary matrix 
mentioned in the corollary does not necessarily represent a permissible 
change of basis. An orthogonal matrix is not always orthogonal similar 
to a diagonal matrix. As an example, consider the matrix representing a 
90° rotation in the Euclidean plane. However, properly interpreted, the 
corollary is useful. 



EXERCISES 

1. Find the diagonal matrices to which the following matrices are unitary 
similar. Classify each as to whether it is Hermitian, unitary, or orthogonal. 



(«) 



(fO 



1 +1 


1 -/-| 


(b) 


"1 

J 


—f 
1_ 


(c) 


2 
1 -i 


2 
1 +/ 

2 J 


2 







0.6 -0.8" 
0.8 0.6 



" 3 

1 -i 



1 +f 

2 



(e) 



-i 



2. Let A be an arbitrary square complex matrix. Since A *A is Hermitian, there is 
a unitary matrix P such that P*A*AP is a diagonal matrix D. Let F = P*AP. 
Show that F*F = D. Show that D is real and the elements of D are non-negative. 

3. Show that every complex matrix can be written as the sum of a real matrix, 
and an imaginary matrix; that is, if M is complex, then M = A + Bi where A and 
B are real. Show that M is Hermitian if and only if A is symmetric and B is skew- 
symmetric. Show that M is skew-Hermitian if and only if A is skew-symmetric and 
B is symmetric. 



11 I Real Vector Spaces 

We now wish to consider linear transformation and matrices in vector 
spaces over the real numbers. Much of what has been done for complex 
vector spaces can be carried over to real vectors spaces without any difficulty. 
We must be careful, however, when it comes to theorems depending on the 



210 Orthogonal and Unitary Transformations, Normal Matrices | V 

existence of eigenvalues and eigenvectors. In particular, Theorems 7.1 and 
7.2 do not carry over as stated. Those parts of Section 8 and 9 which depend 
on these theorems must be reexamined carefully before their implications for 
real vector spaces can be established. 

An examination of the proof of Theorem 7.1 will reveal that the only use 
made of any special properties of the complex numbers not shared by the 
real numbers was at the point where it was asserted that each linear trans- 
formation has at least one eigenvalue. In stating a corresponding theorem 
for real vector spaces we have to add an assumption concerning the existence 
of eigenvalues. Thus we have the following modification of Theorem 7.1 
for real vector spaces. 

Theorem 11.1 Let V be a finite dimensional vector space over the real 
numbers, and let a be a linear transformation on V whose characteristic poly- 
nomial factors into real linear factors. Then there exists an orthonormal basis 
of V with respect to which the matrix representing a is in super diagonal form. 

proof. Let n be the dimension of V. The theorem is certainly true for 
n = 1. 

Assume the theorem is true for real vector spaces of dimensions <«. 
Let X x be an eigenvalue for a and let £{ ^ 0, |||{|| = 1, be a corresponding 
eigenvector. There exists an orthonormal basis with £{ as the first element. 
Let the basis be X' = {£[, . . , , £^} and let W be the subspace spanned by 
{!;, . . . , f ;}. For each a = j% =1 a^ define r(a) = %U a & G W - Then 
to- restricted to W is a linear transformation of W into itself. 

In the proof of Theorem 7.1 we could apply the induction hypothesis to 
tg without any difficulty since the assumptions of Theorem 7.1 applied to 
all linear transformations on V. Now we are dealing with a set of linear 
transformations, however, whose characteristic polynomials factor into 
real linear factors. Thus we must show that the characteristic polynomial 
for to factors into real linear factors. 

First, consider to as defined on all of V. Since to(£[) = t(A 1 |j) = 0, 
Tff(a) = Tcr[r(a)] = tot(ol) for all a e V. This implies that (W) fc (a) = rcr fc (a) 
since any t to the right of a o can be omitted if there is a t to the left of that a. 

Let f(x) be the characteristic polynomial for a. It follows from the 
observations of the previous paragraph that Tf{To) = Tf(o) = on V. 
But on W, t acts like the identity transformation, so that /(to - ) = when 
restricted to W. Hence, the minimum polynomial for to on W divides /(#). 
By assumption, f(x) factors into real linear factors so that the minimum 
polynomial for to on W must also factor into real linear factors. This 
means that the hypotheses of the theorem are satisfied for to on W. By 
induction, there is an orthogonal basis {r} 2 , . . . , r} n } of W such that for 
each rj k , To{r) k ) is expressible in terms of {tj 2 , . . . , rj k } alone. We see from 



11 I Real Vector Spaces 211 

the way t is defined that a(rj k ) is' expressible in terms of {^, rj 2 , . . . , rj k } 
alone. Let rj t = f[. Then Y = {%, r} 2 , . . . , r] n } is the required basis. □ 

Since any n X n matrix with real entries represents some linear trans- 
formation with respect to any orthonormal basis, we have 

Theorem 11.2. Let Abe a real matrix with real characteristic values. Then 
A is orthogonal similar to a superdiagonal matrix. □ 

Now let us examine the extent to which Sections 8 and 9 apply to real 
vector spaces. Theorem 8.1 applies to matrices with coefficients in any 
subfield of the complex numbers and we can use it for real matrices without 
reservation. Theorem 8.2 does not hold for real matrices, however. To 
obtain the corresponding theorem over the real numbers we must add the 
assumption that the characteristic values are real. A normal matrix with real 
characteristic values is Hermitian and, being real, it must then be symmetric. 
On the other hand a real symmetric matrix has all real characteristic values. 
Hence, we have 

Theorem 11.3. A real matrix is orthogonal similar to a diagonal matrix if 
and only if it is symmetric. □ 

Because of the importance of real quadratic forms, in many applications 
this is a very useful and important theorem, one of the most important of 
this chapter. We describe some of the applications in Chapter VI and show 
how this theorem is used. 

Of the theorems in Section 9 only Theorems 9.14 and 9.16 fail to hold as 
stated for real vector spaces. As before, adding the assumption that all the 
characteristic values of the linear transformation a are real to the condition 
that a is normal amounts to assuming that a is self-adjoint. Hence, the 
theorems corresponding to Theorems 9.14 and 9.16 are 

Theorem 11.4. If V is a finite dimensional vector space over the real 
numbers and a is a self-adjoint linear transformation on V, then V has an 
orthonormal basis consisting of eigenvectors of a. □ 

Theorem 11.5. Let V be a finite dimensional vector space over the real 
numbers and let a and r be self-adjoint linear transformations on V. Ifctr = to, 
then there exists an orthonormal basis of V consisting of vectors which are 
eigenvectors for both a and t. □ 

Theorem 9.18 must be modified by substituting the words "characteristic 
values" for "eigenvalues." Thus, 

Theorem 11.6. A normal linear transformation a defined on a real vector 
space V is an isometry if and only if all its characteristic values are of absolute 
value 1. □ 



212 



Orthogonal and Unitary Transformations, Normal Matrices | V 



EXERCISES 

1. For those of the following matrices which are orthogonal similar to diagonal 
matrices, find the diagonal form. 

(a) 



{d) 



"13 


6" 




(6) 


'1 T 


(c) 


"i -r 


6 


-3_ 




_5 4_ 




_/ i_ 


" 7 


4 


-4" 


w 


" 13 -4 2" 


4 


-8 


-1 




-4 13 -2 


_-4 


-1 


-8_ 




2 -2 10 


" 3 


-4 


2" 


Of) 


"_4 4 _2 _ 


-4 


-1 


6 




4-4 2 


2 


6 


-2_ 




_-2 2 -1_ 


" 1 


-1 


0" 


(0 


"0 1 2 " 


-1 


1 







1 ~l -1 








1 




_2 -1 — V-_ 


"1 


2 


2" 


(*) 


~5 2 2" 


2 


1 


-2 




2 2-4 


_2 - 


-2 


1 








_2 -4 


2_. 



(/) 



(A) 



0) 



2. Which of the matrices of Exercise 1, Section 8, are orthogonal similar to 
diagonal matrices ? 

3. Let A and B be real symmetric matrices with A positive definite. There is a 
non-singular matrix P such that P T AP = I. Show that P T BP is symmetric. Show 
that there exists a non-singular matrix Q such that Q T AQ = I and Q T BQ is a 
diagonal matrix. 

4. Show that every real skew-symmetric matrix A has the form A = P T BP 
where P is orthogonal and B 2 is diagonal. 

5. Show that if A and B are real symmetric matrices, and A is positive definite, 
then the roots of det (B — xA) = are all real. 

6. Show that a real skew-symmetric matrix of positive rank is not orthogonal 
similar to a diagonal matrix. 

7. Show that if A is a real 2x2 normal matrix with at least one element equal 
to zero, then it is symmetric or skew-symmetric. 

8. Show that if A is a real 2x2 normal matrix with no zero element, then A 
is symmetric or a scalar multiple of an orthogonal matrix. 

9. Let ffbea skew-symmetric linear transformation on the vector space V over 
the real numbers. The matrix A representing a with respect to an orthonormal 



12 | The Computational Processes 213 

basis is skew-symmetric. Show that the real characteristic values of A are zeros. The 
characteristic equation may have complex solutions. Show that all complex 
solutions are pure imaginary. Why are these solutions not eigenvalues of al 

10. (Continuation) Show that a 1 is symmetric. Show that the characteristic 
values of A 2 are real. Show that the non-zero eigenvalues of A 2 are negative. Let 
— ft 2 be a non-zero eigenvalue of a 2 and let I be a corresponding eigenvector. Define 

1 

tj to be -<r(f). Show that o(rj) = — fig. Show that g and r\ are orthogonal. Show 
fi 

that r] is also an eigenvector of a 2 corresponding to — fi 2 . 

11. (Continuation) Let a be the skew-symmetric linear transformation con- 
sidered in Exercises 9 and 10. Show that there exists an orthonormal basis of V 
such that the matrix representing a has all zero elements except for a sequence of 
2x2 matrices down the main diagonal of the form 

' —ft k ~ 

ji k _ 
where the numbers fi k are defined as in Exercise 10. 

12. Let a be an orthogonal linear transformation on a vector space V over the 
real numbers. Show that the real characteristic values of or are ±1. Show that any 
eigenvector of a corresponding to a real eigenvalue is also an eigenvector of a* 
corresponding to the same eigenvalue. Show that these eigenvectors are also 
eigenvectors of a + a* corresponding to the eigenvalues ±2. 

13. (Continuation) Show that a + a* is self-adjoint. Show that there exists a 
basis of eigenvectors of a + a*. Show that if an eigenvector of a + a* is also an 
eigenvector of a, then the corresponding eigenvalue is ±2. Let 2fi be an eigenvalue 
of or + a* for which the corresponding eigenvector g is not an eigenvector of o. 
Show that ft is real and that \/i\ < 1. Show that (g, a(g)) = /<(g, g). 

o(g) - /«£ 

14. (Continuation) Define r\ to be — Show that I and r\ are orthogonal. 

VI — ft 2 

Show that o-(f) = fig + v 7 ! _ ^rj, and a(rj) = - v'l - ffig + fl71 . 

15. (Continuation) Let a be the orthogonal linear transformation considered 
in Exercises 12, 13, 14. Show that there exists an orthonormal basis of V such that 
the matrix representing a has all zero elements except for a sequence of ± l's and/or 
2x2 matrices down the main diagonal of the form 

"cos 6 k —sin 

sin d k cos 6 k \ , 
where fi k = cos 6 k are defined as in Exercise 13. 

12 I The Computational Processes 

We now summarize a complete set of computational steps which will 
effectively determine a unitary (or orthogonal) matrix of transition for 



214 Orthogonal and Unitary Transformations, Normal Matrices | V 

diagonalizing a given normal matrix. Let A be a given normal matrix. 

1. Determine the characteristic matrix C(x) = A — xl. 

2. Compute the characteristic polynomial f(x) = det (A — xl). 

3. Determine all eigenvalues of A by finding all the solutions of the 
characteristic equation f(x) = 0. In any but very special or contrived 
examples this step is tedious and lengthy. In an arbitrarily given example 
we can find at best only approximate solutions. In that case all the following 
steps are also approximate. In some applications special information deriv- 
able from the peculiarities of the application will give information about the 
eigenvalues or the eigenvectors without our having to solve the characteristic 
equation. 

4. For each eigenvalue X { find the corresponding eigenvectors by solving 
the homogeneous linear equations 

C(l i )X=0. (12.1) 

Each such system of linear equations is of rank less than n. Thus the technique 
of Chapter II-7 is the recommended method. 

5. Find an orthonormal basis consisting of eigenvectors of A. If the 
eigenvalues are distinct, Theorem 9.3 assures us that they are mutually 
orthogonal. Thus all that must be done is to normalize each vector and the 
required orthonormal basis is obtained immediately. 

Even where a multiple eigenvalue X t occurs, Theorem 8.2 or Theorem 9.14 
assures us that an orthonormal basis of eigenvectors exists. Thus, the 
nullity of C(X t ) must be equal to the algebraic multiplicity of k t . Hence, 
there is no difficulty in obtaining a basis of eigenvectors. The problem is 
that the different eigenvectors corresponding to the multiple eigenvalue X t 
are not automatically orthogonal; however, that is easily remedied. All 
we need to do is to take a basis of eigenvectors and use the Gram-Schmidt 
orthonormalization process in each eigenspace. The vectors obtained in 
this way will still be eigenvectors since they are linear combinations of 
eigenvectors corresponding to the same eigenvalue. Vectors from different 
eigenspaces will be orthogonal because of Theorem 9.3. Since eigenspaces 
are seldom of very high dimensions, the amount of work involved in applying 
the Gram-Schmidt process is usually quite nominal. 

We now give several examples to illustrate the computational procedures 
and the various diagonalization theorems. Remember that these examples 
are contrived so that the characteristic equation can easily be solved. Ran- 
domly given examples of high order are very likely to result in vexingly 
difficult characteristic equations. 



12 | The Computational Processes 215 

Example 1. A real symmetric matrix with distinct eigenvalues. Let 

"1-2 0" 
A = -2 2-2 
0-2 3_ 
We first determine the characteristic matrix, 

C(x) = 



"1 - X 


-2 


" 


-2 


2 — x 


-2 





-2 


3 -* 



and then the characteristic polynomial, 

f{x) = det C(x) = -x s + 6x* - 3x - 10 = -(x + l)(x - 2){x - 5). 

The eigenvalues are X 1 = — 1, A 2 = 2, A 3 = 5. 

Solving the equations C{X t )X = we obtain the eigenvectors a x = 
(2, 2, 1), a 2 = (-2, 1, 2), a 3 = (1, -2, 2). Theorem 9.3 assures us that 
these eigenvectors are orthogonal, and upon checking we see that they are. 
Normalizing them, we obtain the orthonormal basis 

X={£ 1 = |(2, 2, 1), £ 2 = { (-2, 1, 2), £ 3 = i(l, -2, 2)}. 
The orthogonal matrix of transition is 

"2 -2 1" 
P = i 2 1 -2 

1 2 2_ 
Example 2. A real symmetric matrix with repeated eigenvalues. Let 

"5 2 2" 
^ = 2 2-4 

2 -4 2. 
The corresponding characteristic matrix is 

"5 - x 2 2 

C(x) = 2 2 — x —4 
2 -4 2-x 



216 Orthogonal and Unitary Transformations, Normal Matrices | V 

and the characteristic polynomial is 

f(x) = -x z + 9x 2 - 108 = -0 + 3)(a? - 6) 2 . 

The eigenvalues are A x = — 3, A 2 = A 3 = 6. 

Corresponding to X r — —3, we obtain the eigenvector a x = (1, —2, —2). 
For A 2 = A 3 = 6 we find that the eigenspace S(6) is of dimension 2 and 
is the set of all solutions of the equation 

Thus S(6) has the basis {(2, 1, 0), (2, 0, 1)}. We can now apply the Gram- 
Schmidt process to obtain the orthonormal basis 

±(2, 1,0), -^=(2, -4, 5)). 

Again, by Theorem 9.3 we are assured that a x is orthogonal to all vectors 
in S(6), and to these vectors in particular. Thus, 

X = f 1 (1, -2, -2), 4= (2, 1, 0), -7= (2, -4, 5) 

U V5 3^5 

is an orthonormal basis of eigenvectors. The orthogonal matrix of transition 
is 

~ 1 
3 



P = 



2 


2 


V5 


3V5 


1 


-4 


V5 


3V5 





5 

V5_ 



It is worth noting that, whereas the eigenvector corresponding to an 
eigenvalue of multiplicity 1 is unique up to a factor of absolute value 1, 
the orthonormal basis of the eigenspace corresponding to a multiple eigen- 
value is not unique. In this example, any vector orthogonal to (1, —2, —2) 
must be in S(6). Thus {£(2, 2, -1), £(2, -1,2)} would be another choice 
for an orthonormal basis for S(6). It happens to result in a slightly simpler 
orthogonal matrix of transition (in this case a matrix over the rational 
numbers.) 

Example 3. A Hermitian matrix. Let 

2 I -1 



A = 



1 +/ 



12 | The Computational Processes 
Then 

C(x) = 



111 



2 — x 1 — f 
1 + i 3 — x 



and/ (x) = x 2 — 5x + 4 = (a? — l)(a? — 4) = is the characteristic equation. 
The eigenvalues are A x = 1 and A 2 = 4. (The example is contrived so that the 
eigenvalues are rational, but the fact that they are real is assured by Theorem 
10.1.) Corresponding to X x = 1 we obtain the normalized eigenvector 

£ x = —j= (—1 + /, 1), and corresponding to A 2 = 4 we obtain the normalized 

v3 j 

eigenvector £ 2 = -7= (1,1+ 0- The unitary matrix of transition is 

V3 



V3 



■1 + i 
1 

Example 4. An orthogonal matrix. Let 

" 1 -2 

A=i -2 1 
-2 -2 



1 

1 + iJ 



This orthogonal matrix is real but not symmetric. Therefore, it is unitary 
similar to a diagonal matrix but it is not orthogonal similar to a diagonal 

matrix. We have 

~ 2 2 



C(x) = 



A. Jl <*• 






and, hence, —a? 3 + fa? 2 — fa? + 1 = — (x — l)(a? 2 + fa? + 1) = is the 
characteristic equation. Notice that the real eigenvalues of an orthogonal 
matrix are particularly easy to find since they must be of absolute value 1 . 

-I+2V2/ J „ -I-2V2/ 

The eigenvalues are / x = 1 , A 2 = , and x 3 = . 

3 j 3 

The corresponding normalized eigenvectors are | x = — ?=(1, —1,0), £ 2 = 

V2 

|(1, 1, V2/), and | 3 = |(1, 1, — V2/)- Thus, the unitary matrix of transition 
is 



£/ = 



' 1/V2 i I 

-1/V2 I i 

ify/2 -1/V2. 



218 



Orthogonal and Unitary Transformations, Normal Matrices | V 



EXERCISES 

1. Apply the computational methods outlined in this section to obtain the 
orthogonal or unitary matrices of transition to diagonalize each of the normal 
matrices given in Exercises 1 of Sections 8, 10, and 11. 

2. Carry out the program outlined in Exercises 12 through 15 of Section 11. 
Consider the orthogonal linear transformation a represented by the orthogonal 
matrix 



A = 



Find an orthonormal basis of eigenvectors of a + a*. Find the representation 
of a with respect to this basis. Since a + o* has one eigenvalue of multiplicity 2, 
the pairing described in Exercise 14 of Section 11 is not necessary. If a + a* had 
an eigenvalue of multiplicity 4 or more, such a pairing would be required to obtain 
the desired form. 



chapter 



VI 



Selected 
applications 
of linear 
algebra 



In general, the application of any mathematical theory to any realistic 
problem requires constructing a model of the problem in mathematical 
terminology. How each concept in the model corresponds to a concept 
in the problem requires understanding of both areas on the part of the person 
making the application. If the problem is physical, he must understand the 
physical facts that are to be related. He must also understand how the 
mathematical concepts are related so that he can establish a correspondence 
between the physical concepts and the mathematical concepts. 

If this correspondence has been established in a meaningful way, pre- 
sumably the conclusions in the mathematical model will also have physical 
meaning. If it were not for this aspect of the use of mathematical models, 
mathematics could make little contribution to the problem for it could 
otherwise not reveal any fact or conclusion not already known. The useful- 
ness of the model depends on how removed from obvious the conclusions are, 
and how experience verifies the validity of the conclusions. 

It must be emphasized that there is no hope of making any meaningful 
numerical computations until the model has been constructed and under- 
stood. Anyone who attempts to apply a 'mathematical theory to a real 
problem without understanding of the model faces the danger of making 
inappropriate applications, or the restriction of doing only what someone 
who does understand has instructed him to do. Too many students limit 
their aims to remembering a sequence of steps that "give the answer" instead 
of understanding the basic principles. 

In the applications chosen for illustration here it is not possible to devote 
more than token attention to the concepts in the field of the application. 
For more details reference will have to be made to other sources. We do 
identify the connection between the concept in the application and the con- 
cept in the model. Considerable attention is given to the construction of 

219 



220 Selected Applications of Linear Algebra | VI 

complete and coherent models. In some cases the model already exists in 
the material that has been developed in the first five chapters. This is true 
to a large extent of the applications to geometry, communication theory, 
differential equations, and small oscillations. In other cases, extensive 
portions of the models must be constructed here. This has been necessary 
for the applications to linear inequalities, linear programming, and repre- 
sentation theory. 

1 I Vector Geometry 

This section requires Chapter I, the first eight sections of Chapter II, and 
the first four sections of Chapter IV for background. 

We have already used the geometric interpretation of vectors to give the 
concepts we were discussing a reality. We now develop this interpretation 
in more detail. In doing this we find that the vector space concepts are 
powerful tools in geometry and the geometric concepts are suggestive models 
for corresponding facts about vector spaces. 

We use vector algebra to construct an algebraic model for geometry. 
In our imagination we identify a point P with a vector a from the origin to P. 
a is called the position vector of P. In this way we establish a one-to-one 
correspondence between points and vectors. The correspondence will depend 
on the choice of the origin. If a new origin is chosen, there will result a 
different one-to-one correspondence between points and vectors. The type of 
geometry that is described by the model depends on the type of algebraic 
structure that is given to the model. It is not our purpose here to get involved 
in the details of various types of geometry. We shall be more concerned 
with the ways geometric concepts can be identified with algebraic concepts. 

Let V be a vector space of dimension n. We call a subspace S of dimension 
1 a straight line through the origin. In the familiar model we have in mind 
there are straight lines which do not pass through the origin, so this definition 
must be generalized. A straight line is a set L of the form 

L = a + S (1.1) 

where a is a fixed vector and S is a subspace of dimension 1 . We describe 
this situation by saying that S is a line through the origin and a displaces 
S "parallel" to itself to a new position. 

In general, a linear manifold or flat is a set L of the form 

L = a + S (1.2) 

where a is a fixed vector and S is a subspace. If S is of dimension r, we say 
the linear manifold is of dimension r. A point is a linear manifold of dimen- 
sion 0, a line is of dimension 1 , a plane is of dimension 2, and a hyperplane 
is a linear manifold of dimension n — 1 . 



1 I Vector Geometry 221 

Let V be the dual space of V and let S 1 be the annihilator of S. For every 
<j> e S L we have 

cf>(L) = 0(a) + <f>(S) = 0(a). (1.3) 

On the other hand, let /? be any point in V for which <f>(fi) = 0(a) for all 
e S- 1 -. Then 0(/9 - a) = 0(#) - 0(a) = so that - a e S; that is, 
/? e a + S = L This means that S is identified by /3 as well as by a; that is, 
L is determined by S and any vector in it. 

Let L = a + S be of dimension r. Then S 1 - is of dimension n — r. Let 
{0 l5 . . . , 4> n _ r } be a basis of S 1 . Then 

&0-) = 4>M + <k( s ) = &(«) = c i (1-4) 

for /= 1 ,...,« — r. Then /? £ L if and only if 0;(j#) = c* for i = 1 , . . . , 
n — r. Thus a linear manifold is determined by giving these n — r con- 
ditions, known as linear conditions. The linear manifold L is of dimension 
r if and only if the n — r linear conditions are independent. 

Two linear manifolds L x = a x + S ± and L 2 = a 2 + S 2 are said to be parallel 
if and only if either S x c: S 2 or S 2 «= S x . If l^ and L 2 are of the same dimension, 
they are parallel if and only if S x = S 2 . 

Let Lj and L 2 be parallel and, to be definite, let us take S x c S 2 . Suppose 
L x and L 2 have a point /8 = a x + #! = a 2 + c 2 in common. Then a x = 
a 2 + (^2 — ^i) 6 a 2 + S 2 . Hence > ^ c f-2- Thus, if two parallel linear 
manifolds have a point in common, one is a subset of the other. 

Let L = a + S be a linear manifold and let {a l5 . . . , a r } be a basis for S. 
Then every vector /5 £ L can be written in the form 

/? = a + f]^! + ' • • + *r a r- (1-5) 

As the t u ... , t r run through all values in the field F, p runs through the 
linear manifold L For this reason (1.5) is called a parametric representation 




Fie. 3 



222 Selected Applications of Linear Algebra | VI 

of the linear manifold L. Since a and the basis vectors are subject to a 
wide variety of choices, there is no unique parametric representation. 

Example. Let a = (1, 2, 3) and let S be a subspace of dimension 1 with 
basis {(2, —1, 1)}. Then f = (x u x 2 , x 3 ) e a + S must satisfy the conditions 

(x 1 ,x 2 ,x 3 ) = (1,2, 3) + t(2, -1,1). 

In analytic geometry these conditions are usually written in the form 

x x = 1 + 2/ 

x 3 = 3 + t, 

the conventional or extended form of the parametric equations of a line. 
The annihilator of S in this case has the basis {[1 2 0], [0 1 1]}. The 
equations of the line a + S are then 

x x + 2x 2 + 0% = 1 • 1 + 2 • 2 + = 5, 
0*! + x 2 + x z = + 1 • 2 + 1 • 3 = 5. 

With a little practice, vector methods are more intuitive and easier than the 
methods of analytic geometry. 

Suppose we wish to find out whether two lines are coplanar, or whether 
they intersect. Let L, = a x + S x and L 2 = a 2 + S 2 be two given lines. 
Then S x + S 2 is the smallest subspace parallel to both L x and L 2 . S x + S 2 
is of dimension 2 unless S x = S 2 , a special case which is easy to handle. 
Thus, assume S x + S 2 is of dimension 2. a x + S t + S 2 is a plane containing 
Li and parallel to L 2 . Thus L x and L 2 are coplanar and L x intersects L 2 if 
and only if <x 2 e a x + S 2 + S 2 . To determine this find the annihilator of 
S x + S 2 . As in (1.3) oc a e <Xi + S x + S 2 if and only if (S x + Sa) 1 has the 
same effect on both <x x and a 2 . 

Example. Let L x = (1, 2, 3) + t(2, -1,1) and then let L 2 = (1, 1, 0) 
+ 5(1,0,2). Then S x + S 2 = <(2, -1, 1), (1, 0, 2)) and (S 1 + S 2 y = 
<[2 3 - 1]>. Since [2 3 - 1](1, 2, 3) = 5 and [2 3 - 1](1, 1, 0) = 5, the 
lines ^ and L 2 both lie in the plane M = (1, 2, 3) + <(2, -1,1), (1,0, 2)>. 

We can easily find the intersection of L x and L 2 . Since S x is a proper sub- 
space of S x + S 2 , (S x + Sjj) 1 is a proper subspace of S^. In this case the 
difference in dimension is 1 , so we can find one linearly independent functional 
in S/ that is not in (S x + S^ 1 , for example, [0 1 1]. Then a point of L 2 
is in L x if and only if [0 1 1] has the same effect on it as it has on (1 , 2, 3). 
Since [0 1 1](1,2, 3) = 5 and [0 1 1]{(1, 1, 0) + 5(1, 0, 2)} = 1 + 2s, 
we see that 5 = 2. It is easily verified that (1 , 1 , 0) + 2(1 , 0, 2) = (3, 1 , 4) = 
(1, 2, 3) + (2, -1, 1) is in both L x and L 2 . 



1 I Vector Geometry 223 

An important problem that must be considered is that of finding the 
smallest linear manifold containing a given set of points. Let {P , P lf . . . , P n } 
be a given set of points and let {<x , a 1? . . . , a B } be the corresponding set of 
position vectors. For the sake of providing a geometric interpretation of the 
algebra, we shall speak as though the linear manifold containing the set of 
points is the same as the linear manifold containing the set of vectors. 

A linear manifold containing these vectors must be of the form L = a + S 
where S is a subspace. Since a can be any vector in L, we may as well take 
L = a + S. Since a* — a e S, S contains {a! — a , . . . , a r — a }. On the 
other hand, if S contains {a t - oc , . . . , a r - a }, then a + S will contain 
{oc , a x , . . . , a r }. Thus a + <a x - a , . . . , a r - a > is the smallest linear 
manifold containing {oc , . . . , a r }. If the {<x , . . . , a r } are given arbitrarily, 
there is no assurance that K - <x , . . . , <x r - a } will be a linearly inde- 
pendent set. If it is, the linear manifold L will be of dimension r. In general, 
two points determine a line, three points determine a plane, and n points 
determine a hyperplane. 

An arbitrary vector p will be in L if and only if it can be represented in the 

form 

= a + ^(a, - a ) + f 2 (a 2 - a ) + • • • + t r (* r - a ). (1.6) 

By setting 1 — h — ■ • • - t r = t , this expression can be written in the form 

= t CL + fiCXi + • • • + t r cf. r , (1.7) 

where 

*0+fi+'-- + fr=l- (1.8) 

It is easily seen that (1.8) is a necessary and sufficient condition on the t t 
in order that a linear combination of the form (1.7) lie in L 

We should also like to be able to determine whether the linear manifold 
generated by {a , a l9 . . . , a r } is of dimension r or of a lower dimension. 
For example, when are three points colinear? The dimension of L is less 
than r if and only if {a x - a , . . . , a r - a } is a linearly dependent set; 
that is, if there exists a non-trivial linear relation of the form 

Cx(a, - a ) + • • • + c r (a r - a ) = 0. (1.9) 

This, in turn, can be written in the form 

c a + c^ + • • • + c r a r = 0, (1.10) 

where 

Cb + Cx + '-' + c^O. 0-10 

It is an easy computational problem to determine the c it if a non-zero 
solution exists. If a< is represented by (a u , a ti , . . . , a nj ), then (1.10) becomes 
equivalent to the system of n equations 

a i0 c + c a ci + • • ' + a ir c r = 0, /=!,...,«. (1.12) 



224 Selected Applications of Linear Algebra | VI 

These equations together with (1.11) form a system of n + 1 homogeneous 
linear equations to solve". As a system, they are equivalent to determining 
whether the set {(1, a Xj , a 2i , . . . , a nj ) \j = 0, 1, . . . , r} is linearly dependent. 
Geometrically, the pair of conditions (1.7), (1.8) and the pair of conditions 
(1.10), (1.1 1) are independent of the choice of an origin. Suppose the points 
P , P lt . . . , P r are identified with position vectors from some other point. 
For example, let be the origin and let 0' be a new choice for an origin. 
If a' is the position vector of 0' with reference to the old origin, then 

a i = a i — a ' (1-13) 

is the position vector of P t relative to 0'. If in (1.7) is the position vector 
of B relative to and 0' = — a' is the position vector of B relative to 0', 
then (1.7) takes the form 

r 

ft' = /3 — a' = 2 t k <x. k — a' 

fc=0 
r 

k=0 

=i*x- (i.7)' 

fc=0 

Also, (1.10) takes the form 

r r 

2 c kK = 2 c fc( a fc - a') 

fc=0 k=0 

r r 

= 2 c k*k - 2 c fc a ' 

fc=0 fc=0 

=i% = o. (i.io)' 

k=0 

Since the pair of conditions (1.7), (1.8) is related to a geometric property, 
it should be expected that if a pair of conditions like (1.7), (1.8) hold with one 
choice of an origin, then a pair of similar conditions should hold for another 
choice. The real importance of the observation above is that the new pair 
of conditions involve the same coefficients. 

If (1.7) holds subject to (1.8), we call /S an affine combination of {<x , 
a l5 . . . , a r }. If (1.10) holds with coefficients not all zero subject to (1.11), we 
say that {a , a l5 . . . , a r } is an affinely dependent set. The concepts of affine 
combinations, affine dependence, and affine independence are related to each 
other in much the same way that linear combinations, linear dependence, 
and linear independence are related to each other. For example, the affine 
combination (1.7) is unique if and only if the set {a , a 1} . . . , a r } is affinely 
independent. The set {a , a 1? . . . , a r } is affinely dependent if and only if one 
vector is an affine combination of the preceding vectors. 



1 I Vector Geometry 225 

Affine geometry is important, but its study is much neglected in American 
high schools and universities. The reason for this is primarily that it is diffi- 
cult to study affine geometry without using linear algebra as a tool. A good 
picture of what concepts are involved in affine geometry can be obtained by 
considering Euclidean geometry in the plane or 3-space in which we "forget" 
about distance. In Euclidean geometry we study properties that are unchanged 
by rigid motions. In affine geometry we allow non-rigid motions, provided 
they preserve intersection, parallelism, and colinearity. If an origin is intro- 
duced, a rigid motion can be represented as an orthogonal transformation 
followed by a translation. An affine transformation can be represented by a 
non-singular linear transformation followed by a translation. If we accept 
this assertion, it is easy to see that affine combinations are preserved under 
affine transformations. 

Theorem 1.1. Let {L A :X g A} be any collection of linear manifolds in V. 
Either n AeA L A is empty or it is a linear manifold. 

proof. If r\te)\-x is not empty, let a e t^^ eA L x . Since a £ L x and L A is a 
linear manifold, i x = oc + S x where S A is a subspace of V. Then n A6A L A = 
n A6A( a o + S x ) = a + n AeA S A . Since n AeA S A is a subspace, n xeA L x is a linear 
manifold. □ 

Definition. Given any non-empty subset S of V, the affine closure of S is the 
smallest linear manifold containing S. In view of Theorem 1.1, the affine 
closure of S is the intersection of all linear manifolds containing S, and this 
shows that a smallest linear manifold containing S actually exists. The 
affine closure of S is denoted by A(S). 

Theorem 1.2. Let S be any subset of V. Let S be the set of all affine com- 
binations of finite subsets ofS. Then S is a linear manifold. 

proof. Let {/3 , j8j, . . . , p k } be a finite subset of S. Each & is of the form 

Tj 

Pi = 2* X i) IX ii 

where 

r 

2 xa = i 

1=0 

and each a fj e S. Then for any /S of the form 

/* = i 'A- 

3=0 

and 

i u = i, 

i=0 



226 Selected Applications of Linear Algebra | VI 

we have 

P = 2 M 2 x u*ii) 

3=0 \ i=0 / 



and 



= 22 x a x Pi 

3=0 i=0 



2 2 x n f i = 2 h 2 

3=0 i=0 j'=0 i=0 



= 2', 

i=o 
= 1. 
Thus /? gS. This shows that S is closed under affine combinations, that is, 
an affine combination of a finite number of elements in S is also in S. 

The observation of the previous paragraph will allow us to conclude that S 
is a linear manifold. Let <x be any fixed element in S. Let S — <x denote the 
set of elements of the form a — a where a e S. If {j8i, |8 8 , . . . , ft.} is a finite 
subset of S — <x , where ft = a* — <x , then 2* =1 c ift = 2*=i c * a i ~ 2*=i 
*i«o = 2io_ c * a * - a o where c = 1 - 2ti c i- Thus 2*=o c i*i eS * nd 
2i=i C A eS — a . This shows that S — a is a subspace. Hence, S = 
a + (S — a ) is a linear manifold. □ 

Theorem 1.3. The affine closure ofS is the set of all affine combinations of 
finite subsets ofS. 

Since S is a linear manifold containing S, A(S) <= S. On the other hand, 
A(S) is a linear manifold containing S and, hence, A(S) contains all affine 
combinations of elements in S. Thus S ez A(S). This shows that S = A(S). □ 

Theorem 1.4. Let l x = a x + S x awd /e/ L 2 = a 2 + S 2 . 77j£?n A(Lx U L 2 ) = 

«i + < a 2 — a i) + $i + S 2 . 

proof. Clearly, L x U L 2 c Kl + <a 2 - a x ) + S x + S 2 and a x + <a 2 - 
a i) + Si + S 2 is a linear manifold containing L x U L 2 . Since <x x e Lj U L 2 <= 
A(L X u La), A^ u L 2 ) is of the form a x + S where S is a subspace. And since 
a 2 6L 1 uL 2 ca 1 + S,a 1 + S = a 2 + S. Thus a 2 - a x 6 S. Since a x + S x = 
L x c= L x U L 2 c ai + S, S x cr S. Similarly, S 2 c S. Thus (<x 2 - clj) + S x + 
S 2 c S, and a x + <a 2 - a x ) + S x + S 2 c ai + S = A^ U L 2 ). This shows 
that A(L X V L 2 ) = a x + <a 2 — a x > + S x + S 2 . D 

Theorem 1.5. Let L x = a x + S x a/w/ fef L 2 = a 2 + S 2 6e /wear manifolds. 
L x n L 2 = if and only if a 2 — a x ^ S x + S 2 . 

proof. If L x n L 2 is not empty, let a e L x n L 2 . Then L x = a + S t = 
a x + S x and L 2 = a + S 2 = a 2 + S 2 . Thus a„ - a x G S x and a 2 — a e S 2 . 



I I Vector Geometry 227 

Hence <x 2 — <x x = (ot — a x ) + (a 2 — <*o) e S x + S 2 . Conversely, if a 2 — a x e 
S x + S 2 then a 2 — a x = y t + y 2 where y x e S x and y 2 e S 2 . Thus a x + y y = 
a 2 — y 2 e ( a i + $i) n (*2 + S a ). D 

Corollary 1.6. L x C\ L 2 = j/ a«J o«/y */ dim A(L 1 U L 2 ) = dim (S x + 
S 2 ) + 1. 7/L x HL^D, dim A(L X U L 2 ) = dim (S x + S 2 ). a 

We now wish to introduce the idea of betweenness. This is a concept 
closely tied to the real numbers, so we assume for the rest of this section that 
F is the field of real numbers or a subfield of the real numbers. 

Let ax and <x 2 be any two vectors. We have seen that every vector ft in the 
line generated by a x and <x 2 can be written in the form ft = ^aj + / 2 a 2 , 
where t x + t 2 = 1 . This is equivalent to the form 

p = (1 - 0«i + '«2- (1-14) 

For / = 0, ft = « ls and for t = 1, /5 = oc 2 . We say that /S is between 
a t and a 2 if and only if t is between and 1. The line segment joining a x 
and <x 2 consists of all points of the form ^oc! + / 2 a 2 where t x + t 2 = 1 and 
t x > and r 2 > 0. A subset C of V is said to be convex if, whenever two 
points are in C, every point of the line segment joining them is also in C. 
Clearly, the space V itself is convex, and every subspace of V is convex. 
Exercise 1 1 of Chapter IV-4 amounts to showing that the two sides of a hyper- 
plane are convex. 

Theorem 1.7. The intersection of any number of convex sets is convex. 

proof. Let {C a } AgA be a collection of convex sets. If a x and a 2 are in 

n AeA C A , then for each X both a x and a 2 are in C x . Since C x is convex, the 

segment joining a x and <x 2 is in C x . Thus the segment is in n XeA C x and the 

intersection is convex. □ 

As a slight generalization of the expression we gave for the line segment 
joining two points, we define a convex linear combination of elements of a 
subset S to be a vector ft expressible in the form 

ft = ^o^ + ? 2 a 2 + • • • + t r a. r , (1.15) 

where 

*i + h + ••• + t r = 1, u >0, (1.16) 

and {a l9 a 2 , . . . , a r } is a finite subset of S. If S is a finite subset, a useful and 
informative picture of the situation is formed in the following way: Imagine 
the points of S to be contained in a plane or 3-dimensional space. The set of 
convex linear combinations is then a polygon (in the plane) or polyhedron 
in which the corners are points of S. Those points of S which are not at 
the corners are contained in the edges, faces, or interior of the polygon or 



228 Selected Applications of Linear Algebra | VI 

polyhedron. Tt is the purpose of Theorem 1.9 to prove that this is a depend- 
able picture. 

Theorem 1.8. A set C is convex if and only if every convex linear combina- 
tion of vectors in C is in C. 

proof. If every convex linear combination of vectors in C is in C, then, 
in particular, every line segment joining a pair of vectors in C is in C. Thus 
C is convex. 

On the other hand, assume C is convex and let (5 = 2*=i t^ be a convex 
linear combination of vectors in C. For r = 1, /? = oc x 6 C; and for r = 2, 
is on the line segment joining o^ and a 2 hence (3 e C. Assume that a convex 
linear combination involving fewer than r elements of C is in C. We can 
assume that t r ^ 1, for otherwise /3 = a r g C. Then for each /, (1 — ? r )a^ + 
La, 6 C and 



r-1 



/? = 2 T^~ { (1 - ^i + *r«r} (1-17) 

i=l 1 — t r 

is a convex linear combination of r — 1 elements of C, and is therefore 
in C. □ 

Let S be any subset of V. The convex hull H(S) of S is the smallest convex 
set containing S. Since V is a convex set containing S, and the intersection 
of all convex sets containing S is a convex set containing S, such a smallest 
set always exists. 

Theorem 1.9. The convex hull of a set S is the set of all convex linear 
combinations of vectors in S. 

proof. Let T be the set of all convex linear combinations of vectors 
in S. Clearly, S c Tand, since H(S) contains all convex linear combinations 
of vectors in S, T <= H(S). Thus the theorem will be established if we show 
that T is convex. Let a, ft eT. These vectors can be expressed in the form 



a = 2 hv-i. 




u > o, 


a^eS, 


r 

= 2 s i*i> 


r 

2 s * = !> 


s t > 0, 


a^eS, 



i=l i=l 

where both expressions involve the same finite set of elements of S. This 
can be done by adjoining, where necessary, some terms with zero coefficients. 
Then for < / < 1 , 

(1 _ Ooc + ,p = j {(1 - t )t t + ttja,. 

«=i 

Since (1 - t)t t + ^ > and 2l=i (0 ~ 0'» + ts t } = (1 - + t = 1, 
(1 — r)a + r/5 6 T and T is a convex set. Thus T = H(S). □ 



2 | Finite Cones and Linear Inequalities 229 

BIBLIOGRAPHICAL NOTES 

The connections between linear algebra and geometry appear to some extent in almost 
all expository material on matrix theory. For an excellent classical treatment see M. 
Bocher, Introduction to Higher Algebra. For an elegant modern treatment see R. W. 
Gruenberg and A. J. Weir, Linear Geometry. A very readable exposition that starts from 
first principles and treats some classical problems with modern tools is available in N. H. 
Kuiper, Linear Algebra and Geometry. 

EXERCISES 

1. For each of the following linear manifolds, write down a parametric repre- 
sentation and a determining set of linear conditions : 

(3) L x = (1,0,1) + <(1,1,1), (2, 1,0)>. 

(2) L 2 = (1, 2, 2) + <(2, 1, -2), (2, -2, 1)>. 

(3) L 3 = (1, 1, 1, 2) + <(0, 1, 0, -1), (2, 1, -2, 3)>. 

2. For L x and L 2 given in Exercise 1 , find a parametric representation and linear 
conditions for L x n L 2 . 

3. In R 3 find the smallest linear manifold L containing {(2, 1, 2), (2, 2, 1), 
( — 1,1,2")}. Show that L is parallel to L x n L 2 , where L x and L 2 are given in Exercise 1. 

4. Determine whether (0,0) is in the convex hull of S = {(1, 1), (-6, 7), (5, -6)}. 
(This can be determined reasonably well by careful plotting on a coordinate system. 
At least it can be done with sufficient accuracy to make a guess which can be verified. 
For higher dimensions the use of plotting points is too difficult and inaccurate. An 
effective method is given in Section 3.) 

5. Determine whether (0, 0, 0) is in the convex hull of T = {(6, —5, —2), 
(3, -8, 6), (-4, 8, -5), (-9, 2, 8), (-7, -2, 8), (-5, 5, 1)}. 

6. Show that the intersection of two linear manifolds is either empty or a linear 
manifold. 

7. If L ± and L 2 are linear manifolds, the join of L x and L 2 is the smallest linear 
manifold containing both L x and L 2 which we denote by L x J L 2 . If L r = a x + S x and 
L 2 = <x 2 + S 2 , show that the join of L x and L 2 is a x + <a 2 — a x > + S 1 + S 2 . 

8. Let i-i = ai + S x and L 2 = a 2 + S 2 . Show that if L x n L 2 is not empty, then 
L x J L 2 = 04 + S x + S 2 . 

9. Let ^ = 04 + S x and L 2 = <x 2 + S 2 . Show that if L x n L 2 is empty, then 
Lj J L 2 5* <x x + S x + S 2 , that is, <x 2 — a x ^ S x + S 2 . 

10. Show that dim L x J L 2 = dim (S x + S 2 ) if L x n L 2 ^ and dim L x J L 2 = 
dim (S x + S 2 ) + 1 if l^ n L 2 = 0. 

2 I Finite Cones and Linear Inequalities 

This section requires Section 1 for background and, of course, those sec- 
tions required for Section 1. Although some material is independently 
developed here, Section 10 of Chapter II would also be helpful. 



230 Selected Applications of Linear Algebra | VI 

In this section and the following section we assume that F is the field of 
real numbers R, or a subfield of R. 

If a set is closed under multiplication by non-negative scalars, it is called 
a cone. This is in analogy with the familiar cones of elementary geometry 
with vertex at the origin which contain with any point not at the vertex 
all points on the same half-line from the vertex through the point. If the 
cone is also closed under addition, it is called a convex cone. It is easily seen 
that a convex cone is a convex set. 

If C is a convex cone and there exists a finite set of vectors {a x , . . . , a. P } 
in C such that every vector in C can be represented as a linear combination 
of the <Xj with non-negative coefficients, a non-negative linear combination, 
we call {oc x , . . . , a^,} the generators of C and call C a. finite cone. The cone 
generated by a single non-zero vector is called a half-line. A dependable 
picture of a finite cone is formed by considering the half-lines formed by each 
of the generators as constituting an edge of a pointed cone as in Fig. 4. By 
considering a solid circular cone in R 3 it should be clear that there are convex 
cones that are not finite. A finite cone is the convex hull of a finite number 
of half-lines. 

Let S be the largest subspace contained in C. If S = {0}, then S contains 
no line through the origin. In this case we say that C is pointed. If S is of 
dimension 1 , then C is wedge shaped with S forming the edge of the wedge. 

Given any subset W <= V, let W+ denote the set of all linear functionals 
that take on non-negative values for all a e W; that is, W+ = {<f> | (fxx > 
for all a £ W}. VV+ is closed under non-negative linear combinations and 
is a convex cone in V. W+ is called the dual cone or polar cone of W. Similarly, 
if VV c: V, then W+ is the set of all vectors which have non-negative values for 




Fig. 4 



2 | Finite Cones and Linear Inequalities 231 

all linear functionals in W. In this case, too, W+ is called the dual cone of W. 
For the dual of the dual (W+)+ we write VV++. 

Theorem 2.1. (1) If W x c VV 2 , then W+ => VV+. 

(2) (W 1 + W 2 )+ = W+ n W+ */0 e W x n W 2 . 

(3) W+ + W+ c (W x n W 2 )+. 

proof. (1) is obvious. 

(2) If <f>eW+nW+, then for all a = a x + a 2 where a x e W x and 
a 2 e W 2 we have cfxx = fa + fa > 0. Hence, VV+ n VV+ c (W x + W 2 )+. 
On the other hand, ^cW^ W 2 so that W+ ^ (W 1 + W 2 )+. Similarly, 
W+ => (W x + W 2 )+. Hence, W+ n W+ => (Wj + W 2 )+. It follows then 
that VV+ n VV+ = (VV X + W 2 )+. 

(3) Wj 3 W t n VV 2 so that VV+ c (^ n W 2 )+. Similarly, W+ c 
(W x n W 2 )+. It then follows that W+ + W+ c (W 1 n W 2 )+. n 

Theorem 2.2. W <= W++ an^ VV+ = VV+++. 

proof. Let W <= V. If a e W, then 0a > for all <£ e W+. This means 
that W c W++. It then follows that W+ c (W+)++ = W+++ On the other 
hand from Proposition 2.1 we have VV+ => (W++)+ = VV+++. Thus W+ = 
W+++. The situation is the same for W c V. n 

A cone C is said to be reflexive if C = C++. 

Theorem 2.3. A cone is reflexive if and only if it is the dual cone of a set 
in the dual space. 
proof. Suppose C is reflexive. Then C = C++ is the dual cone of C+. 

On the other hand, if C is the dual cone ofWcf, then C = W+ = W+++ = 
C++ and C is reflexive. □ 

The dual cone of a finite cone is called a polyhedral cone. If C is a finite 

.A. 

cone in V generated by the finite set G = {fa, . . . , <f) Q }, then C + = D = 
{a | faa. > for all fa e G}. A dependable picture of a polyhedral cone 
can be formed by considering a finite cone, for we soon show that the two 
types of cones are equivalent. Each face of the cone is a part of one of the 
hyperplanes {a j ^a = 0}, and the cone is on the positive side of each of 
these hyperplanes. In a finite cone the emphasis is on the edges as generating 
the cone; in a polyhedral cone the emphasis is on the faces as bounding 
the cone. 

Theorem 2.4. Let o be a linear transformation of U into V.IfC is a finite 
cone in U, then c(C) is a finite cone. IfD is a polyhedral cone in V, then a^D) 
is a polyhedral cone. 

proof. If {a x , . . . , a p } generates C, then {ofa), . . . , <r(a p )} generates 

c(C). Let D be a polyhedral cone dual to the finite cone £ in V. The following 



232 Selected Applications of Linear Algebra | VI 

statements are equivalent: <x e cr -1 (D); cr(a) e D; \pa{<x) > for all ip e £; 
<Ky) a > for all tp g £; ae (cr(E))+. Thus (y~ x {D) is dual to the finite cone 
(x(E) in and is therefore polyhedral. □ 

Theorem 2.5. The sum of a finite number of finite cones is a finite cone 
and the intersection of a finite number of polyhedral cones is a polyhedral cone. 

proof. The first assertion of the theorem is obvious. Let D 1} . . . , D r 
be polyhedral cones, and let C l5 . . . , C r be the finite cones of which they 
are the duals. Then C x + • • • + C r is a finite cone, and by Theorem 2.1 
D x n • • • C\D r = C+ n • • • nC+ = (C x + • • • + C r )+ is polyhedral. □ 

Theorem 2.6. Every finite cone is polyhedral. 

proof. The theorem is obviously true in a vector space of dimension 1 . 
Let dim V = n and assume the theorem is true in vector spaces of dimension 
less than n. 

Let A = {oc 1? . . . , a p } be a finite set generating the finite cone C. We 
can assume that each a k t± 0. For each a. k let W k be a complementary sub- 
space of (a fc >; that is, V = W ft © (a. k ). Let iT k be the projection of V onto 
W k along <a fc >. 7r fc (C) is a finite cone in W^. By the induction assumption it is 
polyhedral since dim W k = n — 1. Then ir- 1 (n k (C)) = C k is polyhedral by 
Theorem 2.4. Since C <= C k for each A;, C is contained in the polyhedral cone 
C,n--- nC p . 

We must now show that if <x ^ C and C is not a half-line, then there is a 
Cj such that <x £ C,. If not, then suppose a e C j for j = 1 , . . . , p. Then 
77\,.(a ) e •n-j(C) so that there is an a i e F such that a + a,a 3 - = ]£?=i ^a^t where 
6»j !> 0. We cannot obtain such an expression with a t < for then oc would 
be in C. But we can modify these expressions step by step and remove all 
the terms on the right sides. 

Suppose b i:j = for / < k and j = 1, . . . ,p; that is, <x + a^ = 
^,i=k^a a i- This is already true for k = 1. Then 

p 

i=k+l 

As before, we cannot have a k — b kk < 0. Set a k — b kk = a' k > 0. Then for 
j j£ k we have 

(l + M«o + ap, = I (b is + % fc ft W 
\ a fc / i=fc+i\ a k J 

Upon division by 1 + -y we get expressions of the form 

V 

a + aja,- = 2 &« a *> J = 1» 2, . . . , p, 



2 | Finite Cones and Linear Inequalities 233 

with a] > and b'^ > 0. Continuing in this way we eventually get a + 

c^ = with c, > for ally. This would imply a,- = a fory = 1, . . . , p. 

Thus C is generated by {— a }; that is, C is a half-line, which is polyhedral. 
This would imply that C is polyhedral. If C is polyhedral there is nothing 
to prove. If C is not half-line, the assumption that <x e C, for all j is un- 
tenable. But this would mean that C = C 2 n • ■ • n C p , in which case C is 
polyhedral. □ 

Theorem 2.7. A polyhedral cone is finite. 

proof. Let C = D+ be a polyhedral cone dual to the finite cone D. 
We have just proven that a finite cone is polyhedral, so there is a finite cone 
£ such that D = E+. But then £ is also polyhedral so that £ = E++ = 
D+ = C. Since £ is finite, C is also. □ 

Although polyhedral cones and finite cones are identical, we retain both 
terms and use whichever is most suitable for the point of view we wish to 
emphasize. A large number of interesting and important results now follow 
very easily. The sum and intersection of finite cones are finite cones. The 
dual cone of a finite cone is finite. A finite cone is reflexive. Also, for finite 
cones part (3) of Theorem 2.1 can be improved. If C x and C 2 are finite cones, 
then (C, n C 2 )+ = (C++ n C++)+ = (C+ + C+)++ = C+ + C+. 

Our purpose in introducing this discussion of finite cones was to obtain 
some theorems about linear inequalities, so we now turn our attention to 
that subject. The following theorem is nothing but a paraphrase of the 
statement that a finite cone is reflexive. 

Theorem 2.8. Let 

a 11 x 1 + • • • + a ln x n > 

(2.1) 

fl«A + • • * + a m n x n > 

be a system of linear inequalities. If 

a x x x + • • • + a n x n > 

is a linear inequality which is satisfied whenever the system (2.1) is satisfied, 
then there exist non-negative scalar s {y x , . . . , y m ) such that ^™ =1 y^ = a$ 
forj= I, ... ,n. 

proof. Let (f> t be the linear functional represented by [a n • • • a in ], 
and let <f> be the linear functional represented by [a x • • • a n ]. If I represented 
by (x lt . . . ,x n ) satisfies the system (2.1), then I is in the cone C+ dual to the 



234 Selected Applications of Linear Algebra | VI 

finite cone C generated by {<f> x , . . . , <f> m }. Since (f>£ > for all £ e C+, 
<f> g C++ = C. Thus there exist non-negative y { such that <f> = ^,™ =1 Vi<f>i- 
The conclusion of the theorem then follows. D 

Theorem 2.9. Let A = {<x l5 . . . , <x n } be a basis of the vector space U and 
let P be the finite cone generated by A. Let a be a linear transformation of U 
into V and let (S be a given vector in V. Then one and only one of the following 
two alternatives holds: either 

(1) there is a £ e P such that a{£) = jS, or 

(2) there is a tp eV such that o(ip) e P+ and ipfi < 0. 

proof. Suppose (1) and (2) are satisfied at the same time. Then 

> ip(S = ipo(g) = 6(yi)£ > 0, which is a contradiction. 

On the other hand, suppose (1) is not satisfied. Since P is a finite cone, 
a(P) is a finite cone. The insolvability of (1) means that /? ^ o{P). Since 
a(P) is also a polyhedral cone, there is a tp e V such that y>/3 < and tpo , {P) > 
0. But then a(y)(P) > so that 6(rp) 6 P+. □ 

It is apparent that the assumption that A is a basis of U is not used in the 
proof of Theorem 2.9. We wish, however, to translate this theorem into 
matrix notation. If | is represented by X = (a; l5 . . . , x n ), then £ e P if 
and only if each a^ > 0. To simplify notation we write "X > 0" to mean 
each x t > 0, and we refer to P as the positive orthant. Since the generators 
of P form a basis of U, the generators of P+ are the elements of the dual basis 
A = {<Ai> • • • 5 <l>n}- It thus turns out that P+ is the positive orthant of 0. 

Let /? = {&, . . . , /8 m } be a basis of V and B = {&, . . . , fi m } the dual 
basis in V. Let ,4 = [a tj ] represent a with respect to A and 6, 5 = (b lf . . . , 
b m ) represent /?, and Y = [y x • • • y m ] represent tp. Then a(ip) is represented 
by YA and a(yi) e P+ if and only if YA > 0. In this notation Theorem 2.9 
becomes 

Theorem 2.10. One and only one of the following two alternatives holds: 
either 

(1) there is an X > such that AX = B, or 

(2) there is a Y such that YA > and YB < 0. □ 

Rather than continue to make these translations we adopt notational 
conventions which will make such translations more evident. We write 

1 > to mean £ e P, £ > £ to mean £ — £ e P, <r(^) > to mean a(ip) e P+, 
etc. 

Theorem 2.11. With the notation of Theorem 2.9, let <f> be a linear func- 
tional in U, let g be an arbitrary scalar, and assume {3 £ a(P). Then one and 
only one of the following two alternatives holds; either 

(1) there is a | > such that <t(£) = /? and <f>£ > g, or, 

(2) there is ip e V such that 6(ip) > <f> and xpfi < g. 



2 | Finite Cones and Linear Inequalities 235 

proof. Suppose (1) and (2). are satisfied at the same time. Then 
g > y)fi = ^cr(l) = o(y>)£ > <f>£ > g, which is a contradiction. 

On the other hand, suppose (2) is not satisfied. We wish to find a f e P 
satisfying the conditions er(£) = j3 and <£| > g at the same time. We have 
seen before that vectors and linear transformations can be used to express 
systems of equations as a single vector equation. A similar technique works 
here. 

Let U t = U © F be the set of all pairs (|, x) where £ e U and xeF. U x 
is made into a vector space over F by defining vector addition and scalar 
multiplication according to the rules 

(fi» *i) + (fa. *s) = (£i + ?2» *i + *«)> «(£» *) = ( fl f » «*)■ 
Let P be the set of all (£, a;) where I e P and a; ^> 0. It is easily seen that P 
is a finite cone in U v In a similar way we construct the vector space V x = 
V©F. 

We then define S to be the mapping of U x into V x which maps (I, *) 
onto 2(|, a;) = (<r(!), </*£ — x )- Jt can De checked that 2 is linear. It is 
now seen that (g,x)eP and E(£, a) = (|8, g) are equivalent to the conditions 
I e P, <r(£) = p, and 0£ = g + x £ g. ^ 

To use Theorem 2.9 we must describe U x and Vi and determine the adjoint 

transformation E. It is not difficult to see that U © F is isomorphic to 1/ F 
where ((£, y) is the linear functional defined by the formula (<f>, y)(f , a;) = 

<£| + ya\ In a similar way V © F is isomorphic to V © F. Then 2(y>, y) 
applied to (f , x) must have the effect 

S(V, */(£ , *) = (v» y) 2 (£ » *) 

= (y, *)(*(*), *f - *) 

= y><T(g) + y(^f - *) 

= #(v0£ + #1 - y* 
= (#(v) + y<f>)£ - v x - 

This means that S(y, y) = (ff(y>) + y<£, —y). 

Now suppose there exist xp £ V and yeF for which S(y», y) g P + and 
(V> #)(/*» g) = V>P + Vg < 0- Tnis is m tne form of condition (2) of Theorem 
2.9 and we wish to show that it cannot hold. S(y, y) e P+ means <7(y) + y<f>^. 

and — y > 0. If — y > 0, then ol— I > <f>. Since (2) of this theorem is 

assumed not to hold this means I — J/5 > g, or tpp + yg > 0. If y = 0, 

then a(\p) > 0. Since /8 e a(P) by assumption, yfi = yp + yg < would 
contradict Theorem 2.9. Thus (2) of Theorem 2.9 cannot be satisfied. This 



236 Selected Applications of Linear Algebra | VI 

implies there is a (£, x)eP such that 2(£, x) — (<r(£). cf>£ - x) = (ft, g), 
which proves the theorem. □ 

Theorem 2.12. Let P x be the positive orthant in U generated by the basis 
A = {a 1? . . . , a„} and P 2 the positive orthant in V generated by the basis 
8 = {|#i, . . . , /5 m }. Let a, /?, and <f> be given and assume /S e a(P 1 ) + P 2 . For 
each scalar g one and only one of the following two alternatives holds, either 

(1) there is a £ > such that cr(£) < /5 am/ </>£ > g, or 

(2) ?/*ere is a ip > swc/i /Aa? ^(y) > </> a«J ipfi < g. 

proof. Construct the vector space U © V and define the mapping S 
of U © V into V by the rule 

2(£, 77) = <r(£) + 77. 
Then the condition E(£, rj) = ft with (£, 77) ^ is equivalent to ft — cr(£) = 

t? > with £ > 0. Since Z(y>) = (d-(y>)» v), the condition S(v>) > (</>, 0) 
is equivalent to a(ip) > ip and ^ > 0. With this interpretation, the two 
conditions of this theorem are equivalent to the corresponding conditions of 
Theorem 2.11. □ 

Theorem 2.13. Let a, /S, and <f> be given and assume fi e a(U). For each 
scalar g one and only one of the following two alternatives holds, either 

(1) there is a £ e U such that <?(£) = (3 and </>£ > g, or 

(2) there is a ip eV such that a(ip) = <f> and ipfi < g. 

proof. Construct the vector space U © U and define the mapping 2 
of U © U into V by the rule 

S(&, *■) = *(£i) - *(&) = *(£i - l 2 ). 

Let £ = £ x - £ 2 . Then the condition S^, £ 2 ) = with (£ 1? £ 2 ) > is 
equivalent to <r(£) = /? with no other restriction on £ since every £ e U 
can be represented in the form £ = £ x — £ 2 with £1 > and £ 2 > 0. Since 

A. A. 

S(v) = (ffCyO* — o'Cv))' the condition 2(^) > (</», —0) is equivalent to 
a(ip) = cf>. With this interpretation, the two conditions of this theorem 
are equivalent to the corresponding conditions of Theorem 2.11. □ 

Notice, in Theorems 2.1 1 , 2.12, and 2.13, how an inequality for one variable 
corresponds to the condition that the other variable be non-negative, while 
an equation for one of the variables leaves the other variable unrestricted. 
For example, we have the conditions a(£) = /S and ip e V in Theorem 2.11 
replaced by o(g) < /S and ip > in Theorem 2.12. 

Theorem 2.14. One and only one of the following two alternatives holds: 
either 

(1) there is a £ > such that a(£) = and <f>£ > 0, or 

A. 

(2) there is a ip eV such that 6{ip) > <j>. 



2 | Finite Cones and Linear Inequalities 237 

proof. This theorem follows from Theorem 2.11 by taking p = and 
g > 0. The assumption that £ <r(P) is then satisfied automatically and 
the condition y>@ < g is not a restriction. □ 

Theorem 2.15. One and only one of the following two alternatives holds: 
either 

(1) there is a I > such that o(£) < (3, or 

(2) there is a ip > rac/i that a (if) > an<5? y>/? < 0. 

proof. This theorem follows from Theorem 2.12 by taking <j> = and 
g < 0. In this case the assumption that e cr^) + P 2 is not satisfied 
automatically. However, in Theorem 2.12 conditions (1) and (2) are sym- 
metric and this assumption could have been replaced by the dual assumption 
that —cf>eP+ + a(— P+), and this assumption is satisfied. □ 

Theorem 2.16. One and only one of the following two alternatives holds: 
either 

(1) there is a I £ U such that o(£) = /?, or 

(2) there is a xp £ V such that a(ip) = and tpfi < 0. 

proof. This theorem follows from Theorem 2.13 by taking (f> = and 
g < 0. Again, although the condition /? £ a(U) is not satisfied automatically, 
the equally sufficient dual condition <j> £ cr(V) is satisfied. □ 

It is sometimes convenient to express condition (2) of Theorem 2.16 in a 
slightly different form. It is equivalent to assert that there is a ip £ V such 

that a(rp) = and ytfi = 1. If y satisfies condition (2), then — satisfies this 

condition. In this form Theorem 2.16 is equivalent to Theorem 7.2 of 
Chapter II, and identical to Theorem 5.5 of Chapter IV. 

An application of these theorems to the problem of linear programming is 
made in Section 3. 



BIBLIOGRAPHICAL NOTES 

An excellent expository treatment of this subject with numerous examples is given by 
D. Gale, The Theory of Linear Economic Models. A number of expository and research 
papers and an extensive bibliography are available in Kuhn and Tucker, Linear Inequalities 
and Related Systems, Annals of Mathematics Studies, Study 38. 



EXERCISES 

1. In R 3 let C x be the finite cone generated by {(1, 1,0), (1,0, -1), (0, -1, 1)}. 
Wrtie down the inequalities that characterize the polyhedral cone C+. 

2. Find the linear functional which generate C+ as a finite cone. 



238 



Selected Applications of Linear Algebra | VI 



3. In R 3 let C 2 be the finite cone generated by {(0, 1,1), (1, -1,0), (1,1, 1)}. 
Write down a set of generators of C l + C 2 , where C x is the finite cone given in 
Exercise 1. 

4. Find a minimum set of generators of C x + C 2 , where C x and C 2 are the cones 
given in Exercises 1 and 3. 

5. Find the generators of C+ + C+, where C x and C 2 are the cones given in 
Exercises 1 and 3. 

6. Determine the generators of C x n C 2 . 
"0 1 1" 



7. Let A = 



1 -1 
1 



and 



B = 



Determine whether there is an X > such that AX = B. (Since the columns of 
A are the generators of C 2 , this is a question of whether (1, 1, 0) is in C 2 . Why?) 

8. Use Theorem 2.16 (or the matrix equivalent) to show that the following 
system of equations has no solution : 

2x x + 2x 2 = —1. 

9. Use Theorem 2.9 (or 2.10) to show that the following system of equations 
has no non-negative solution : 

10. Prove the following theorem: One and only one of the following two 
alternatives holds : either 

(1) there is a £ e U such that ct(£) > p, or 

(2) there is a y> > such that ff(y) = and y>(? > 0. 

1 1 . Use the previous exercise to show that the following system of inequalities 
has no solution. 

2x x + 2x 2 > 1. 

12. A vector I is said to be positive if I = ^,f =1 x i^i where each x i > and 
{£ x , . . . , £ n } is a basis generating the positive orthant. We use the notation f > 
to denote the fact that I is positive. A vector I is said to be semi-positive if I > 
and ^#0. Use Theorem 2.11 to prove the following theorem. One and only one 
of the following two alternatives holds : either 

(1) there is a semi-positive £ such that cr(|) = 0, or 

(2) there is a y> e V such that 6(xp) is positive. 



3 | Linear Programming 239 

13. Let W be a subspace of U, and let W 1 be the annihilator of W. Let l^, ... , 
%} be a basis of W- 1 . Let V be any vector space of dimension r over the same 
coefficient field, and let {j8 l5 . . . , /3 r } be a basis of V. Define the linear transformation 
a of U into V by the rule, <r(f) = Jj-i *?;(£)&• Show that W is the kernel of <r. 
Show that W^ = 6(V). 

14. Show that if W is a subspace of U, then one and only one of the following 
two alternatives holds: either 

(1) there is a semi-positive vector in W, or 

(2) there is a positive linear functional in W 1 - . 

15. Use Theorem 2.12 to prove the following theorem. One and only one of the 
following two alternatives holds : either 

(1) there is a semi-positive £ such that <r(£) < 0, or 

(2) there is a y > such that 6(y>) > 0. 

3 I Linear Programming 

This section requires Section 2 for background. Specifically, Theorem 2.12 
is required. If we were willing to accept that theorem without proof, the 
required background would be reduced to Chapter I, the first eight sections 
of Chapter II, and the first four sections of Chapter IV. 

Given A = [a it ], B = (b lt . . . , bj, and C = [c x • • • cj, the standard 
maximum linear programming problem is to find any or all non-negative 
X = (#!, . . . , x n ) which maximize 

CX (3.1) 

subject to the condition 

AX < B. (3.2) 

CX is called the objective function and the linear inequalities contained in 
AX < B are called the linear constraints. 

There are many practical problems which can be formulated in these 
terms. For example, suppose that a manufacturing plant produces n different 
kinds of products and that x, is the amount of they'th product that is produced. 
Such an interpretation imposes the condition x s ;> 0. If c, is the income from 
a unit amount of they'th product, then 2?=i c j x i = CX is the total income. 
Assume that the objective is to operate this business in such a manner as to 
maximize CX. 

In this particular problem it is likely that each c, is positive and that CX 
can be made large by making each x t large. However, there are usually 
practical considerations which limit the quantities that can be produced. 
For example, supppse that limited quantities of various raw materials to 
make these products are available. Let b t be the amount of the ith ingredient 
available. If a Xi is the amount of the rth ingredient consumed in producing 



240 Selected Applications of Linear Algebra | VI 

one unit of the y'th product, then we have the condition ]£j=i a u x i ^ ^- 
These constraints mean that the amount of each product produced must be 
chosen carefully if CX is to be made as large as possible. 

We cannot enter into a discussion of the many interesting and important 
problems that can be formulated as linear programming problems. We 
confine our attention to the theory of linear programming and practical 
methods for finding solutions. Linear programming problems often involve 
large numbers of variables and constraints and the importance of an efficient 
method for obtaining a solution cannot be overemphasized. The simplex 
method presented by G. B. Dantzig in 1949 was the first really prac- 
tical method given for solving such problems, and it provided the stimulus 
for the development of an extensive theory of linear inequalities. It is 
the computational method we describe here. 

The simplex method is deceptively simple and it is possible to solve prob- 
lems of moderate complexity by hand with it. The rationale behind the 
method is more subtle, however. We establish necessary and sufficient 
conditions for the linear programming problem to have solutions and 
determine procedures by which a proposed solution can be tested for opti- 
mality. We describe the simplex method and show why it works before giving 
the details of the computational procedures. 

We must first translate the statement of the linear programming problem 
into the terminology and notation of vector spaces. Let U and V be vector 
spaces over F of dimensions n and m, respectively. Let A = {<x 1? . . . , a n } 
be a fixed basis of U and let B = {/3 l9 . . . , ft m } be a basis of V. If A = [a i:j ] 
is a given m x n matrix, we let a be the linear transformation of U into V 
represented by A with respect to A and B. Let P x be the finite cone in U 
generated by A, and P 2 the finite cone in V generated by B. 

If /? is the vector in V represented by B — (b x , . . . , b m ), the condition 

A A 

AX < B is equivalent to saying that <r(£) < /S. Let A be the basis in U 

A A 

dual to A and let B be the basis in V dual to B. Let <j> be the linear functional 

A 

in U represented by C = [c x , . . . c n ]. In these terms the standard maximum 
linear programming problem is to find any or all I > which maximize 
</>£ subject to the constraint tfd) < ft. 

Let a be the dual of a. The standard dual linear programming problem 
is to find any or all ip > which minimize tpfi subject to the constraint 
a(xp) > <f). If we take a' = ' — a, ft' = — /?, and </>' = —<f>, then the dual 
problem is to find a. ip > which maximizes xpfi' subject to the constraint 
ff'CyO < 4> ■ Thus, the relation between the original problem, which we call 
the primal problem, and the dual problem is symmetric. We could have 
taken a minimum problem as the primal problem, in which case the dual 
problem would have been a maximum problem. In this discussion, however, 
we consistently take the primal problem to be a maximum problem. 



3 | Linear Programming 241 

Any f > such that ct(|) < fi is called a feasible vector for the standard 
primal linear programming problem. If a feasible vector exists, the primal 
problem is said to be feasible. Any y > such that £ (ip) > <f> is called a 
feasible vector for the dual problem, and if such a vector exists, the dual 
problem is said to be feasible. A feasible vector £ such that <f>£ > 01 for 
all feasible | is called an optimal vector for the primal problem. 

Theorem 3.1. The standard linear programming problem has a solution 
if and only if both the primal problem and the dual problem are feasible. 
The dual problem has a solution if and only if the primal problem has a solution, 
and the maximum value of '</>£ for the primal problem is equal to the minimum 
value oj 'xpfi for the dual problem. 

proof. If the primal linear programming problem is infeasible, then 
certainly no optimum vector exists. If the primal problem is feasible, then 
the assumption e a(P 1 ) + P 2 of Theorem 2.12 is satisfied. If the dual 
problem is infeasible, then condition (2) of Theorem 2.12 cannot be satisfied. 
Thus for every g there is a | > such that o(£) < /? and </>£ > g. This 
means the values of <£! are unbounded and the primal problem has no 
solution. 

Now, assume that both the primal problem and dual problem are feasible. 
If £ is feasible for the primal problem and \p is feasible for the dual problem, 
then < xp{p - <?(£)} = # - y>o(g) = y>0 - o(y>)£ = y>P - H + {</> ~ 
o(y>)} < # — <f>£. Thus cf>i is a lower bound for the values of y>|8. Assume, 
for now, that F is the field of real numbers and let g be the greatest lower 
bound of the values of y>/ff for feasible y>. With this value of g condition (2) 
of Theorem 2.12 cannot be satisfied, so that there exists a feasible | such that 
<££o > g- since <££o is also a lower bound for the values of y/S, </>!„ > g 
is impossible. Thus <f>£ = g and <f>£ >: </>£ for all feasible f . Because of the 
symmetry between the primal and dual problems the dual problem has a 
solution under exactly the same conditions. Furthermore, since g is the 
greatest lower bound for the values of %p{i, g is also the minimum value of y>0 
for feasible ip. 

If we permit F to be a subfield of the real numbers, but do not require 
that it be the field of real numbers, then we cannot assert that the value of 
g chosen as a greatest lower bound must be in F. Actually, it is true that g 
is in F, and with a little more effort we could prove it at this point. However, 
if A, B, and C have components in a subfield of the real numbers we can 
consider them as representing linear transformations and vectors in spaces 
over the real numbers. Under these conditions the argument given above 
is valid. Later, when we describe the simplex method, we shall see that the 
components of £ will be computed rationally in terms of the components 
of A and B and will lie in any field containing the components of A, B, and C. 
We then see that g is in F. □ 



242 Selected Applications of Linear Algebra | VI 

Theorem 3.2. If £ is feasible for the standard primal problem and y> is 
feasible for the dual problem, then £ is optimal for the primal problem and 
xp is optimal for the dual problem if and only if y>{/3 — o"(£)} = and {d(ip) — 
<£}£ = 0, or if and only if (f>tj = yfi. 

proof. Suppose that £ is feasible for the primal problem and tp is feasible 
for the dual problem. Then < \p{(5 — <7(£)} = ipfi — ya{£) = tpft — 
#(yj)g = yp - <££ + {(f) - d(y>)}£ <ipfi- <££• It is clear that y{p - <r(£)} = 
and {a(tp) — (£}£ = if and only if ip(3 = <££. 

If £ and tp are feasible and </>£ = y) P then <££ < ip f$ = <f>£o f° r a U 
feasible £. Thus £ is optimal. A similar argument shows that ip Q is optimal. 
On the other hand, suppose £ and ip are optimal. Let tp o p = g. Then 
condition (2) of Theorem 2.12 cannot be satisfied for this choice of g. Thus, 
there is a feasible £ such that <££ > g. Since £ is optimal, we have </>£„ > 
</>£ > g = y) o p. Since t££ < ip @, this means </>£„ = y /5. n 

Theorem 3.2 has an important interpretation in terms of the inequalities 
of the linear programming problem as originally stated. Let £ = /S — <r(£) 
be represented by Z = (z l9 . . . , z m ) and let r\ = <7(y>) — be represented 
by W = [w 1 - • • w n ]. Then the feasibility of £ and tp implies z t > and 
w* > 0. The condition tpt, = 2f=i 2/j z ; = means t/ i z i = for each /. 
Thus z i > implies y t = 0, and y t > implies z* = 0. This means that 
if 2"=i a a x i ^ ^ * s satisfied as a strict inequality, then y t = 0, and if y t > 0, 
this inequality must be satisfied as an equality. A similar relation holds 
between the x j and the dual constraints. This gives an effective test for the 
optimality of a proposed feasible pair X = (x u . . . , x n ) and Y = [y 1 • • • y m ]. 

It is more convenient to describe the simplex method in terms of solving 
the equation cr(£) = (3 instead of the inequality <r(£) < /5. Although these 
two problems are not equivalent, the two types of problems are equivalent. 
In other words, to every problem involving an inequality there is an equiv- 
alent problem involving an equation, and to every problem involving an 
equation there is an equivalent problem involving an inequality. To see 
this, construct the vector space U ® V and define the mapping a x of U © V 
into V by the rule cr 1 (£, rj) = <r(£) + r\. Then the equation o^g, rj) = (I 
with £ > and r\ > is equivalent to the inequality cr(£) < /S with £ > 0. 
This shows that to each problem involving an inequality there is an equivalent 
problem involving an equation. 

To see the converse, construct the vector space V © V and define the 
mapping a 2 of U into V © V by the rule (7 2 (£) = (#(£), — ff(£)). Then the 
inequality <7 2 (£) < (fj, — /S) with £ > is equivalent to the equation #(£) = ft 
with £ > 0. 

A. 

Given a linear transformation a of U into V, ft e V, and <f> e U, the canonical 
maximum linear programming problem is to find any or all £ > which 



3 | Linear Programming 243 

maximize <j>£ subject to the constraint <r(f) = p. With this formulation 
of the linear programming problem it is necessary to see what becomes of the 
dual problem. Referring to a 2 above, for fa, yj e V ® V we have a z fa, 

V 8 )£ = fa, V>J<*M = fa* Va)(<K£)> -*(*» = Vi<*(£) - V2^(D = f (Vi - 
V> a )|. Thus, if we let ip = y> x — xp 2 , we see that we must have a fa = 
# a (Vi» V») ^ <£ and (Vi» V2) e P 2~ © ^ But^the condition fa, ipj e 
P% ® P£ is not a restriction on y> since any y> e V can be written in the form 
^ = Vl — Va where v>i > ° and V2 ^ °- Thl i s ' the canonical dual linear 
programming problem is to find any or all y e V which minimize yp subject 
to the constraint a fa) > <f>. 

It is readily apparent that condition (1) and (2) of Theorem 2.11 play 
the same roles with respect to the canonical primal and dual problems that 
the corresponding conditions (1) and (2) of Theorem 2.12 play with respect 
to the standard primal and dual problems. Thus, theorems like Theorem 
3.1 and 3.2 can be stated for the canonical problems. 

The canonical primal problem is feasible if there is a I > such that 
<r(f) = (} and the canonical dual problem is feasible if there is a y> e V such 
that a fa) > <f>. 

Theorem 3.3. The canonical linear programming problem has a solution 
if and only if both the primal problem and the dual problem are feasible. 
The dual problem has a solution if and only if the primal problem has a solution, 
and the maximum value of '</>£ for the primal problem is equal to the minimum 
value ofxpPfor the dual problem. □ 

Theorem 3.4. If I is feasible for the canonical primal problem and rp is 
feasible for the dual problem, then I is optimal for the primal problem and y 
is optimal for the dual problem if and only if {a fa — <f>}£ = 0, or if and only 

From now on, assume that the canonical primal linear programming 
problem is feasible; that is, P e o(PJ. There is no loss of generality in 
assuming that a{Pj) spans V, for in any case p is in the subspace of V spanned 
by o(PJ and we could restrict our attention to that subspace. Thus {afa), 
. . . , o(<x n )} = o(A) also spans V and a(A) contains a basis of V. A feasible 
vector £ is called a basic feasible vector if £ can be expressed as a linear 
combination of m vectors in A which are mapped onto a basis of V. The 
corresponding subset of A is said to be feasible. 

Since A is a finite set there are only finitely many feasible subsets of A. 
Suppose that £ is a basic feasible vector expressible in terms of the feasible 
subset K, . . . , oc w }, I = 2*=i *i*<- Then *(*) = 2S.1 ^W = £ Since 
{<r(«i), . • • , tf(O) is a basis the representation of p in terms of that basis is 



244 Selected Applications of Linear Algebra | VI 

unique. Thus, to each feasible subset of A there is one and only one basic 
feasible vector; that is, there are only finitely many basic feasible vectors. 

Theorem 3.5. If the canonical primal linear programming problem is 
feasible, then there exists a basic feasible vector. If the canonical primal 
problem has a solution, then there exists a basic feasible vector which is 
optimal. 

proof. Let £ = ]££=! x i<*-i De feasible. If {ofa), . . . , a(a k )} is linearly 
independent, then £ is a basic feasible vector since {a l5 . . . , a J can be 
extended to a feasible subset of A. Suppose this set is linearly dependent ; 
that is, 2? =1 fiCfa) = where at least one t t > 0. Then JjLi ( x i — at i) 
a(a f ) = j8 for every a e F. Let a be the minimum of xjti for those t t > 0. 
For notational convenience, let x k /t k = a. Then x k — at k = and ^tli ( x i ~ 
at^o-fai) = {}. If t i < we have x i — a?; > because a > 0. lft t > 0, then 
x i — ati >x t — (xjt^tt = 0. Thus £' = 2*=i fat — ^O a i is a l so feasible 
and expressible in terms of fewer elements of A. We can continue in this way 
until we obtain a basic feasible vector. 

Now, suppose the canonical problem has a solution and that £ as given 
above is an optimal vector. If {a l5 . . . , a fc } is not a feasible subset of A, we 
can assume x t > for i = I, . . . , k since otherwise £ could be expressed 
in terms of a smaller subset of A. Let ip be an optimum vector for the dual 
problem and let r\ = d(ip) — </> be represented by W = [vt^ • • • w n ]. By 
Theorem 3.4 w t = for / = 1, . . . , k. It then follows that £' obtained as 
above is also optimal. We can continue in this way until we obtain a basic 
feasible vector which is optimal. □ 

If a linear programming problem has a solution, there remains the problem 
of finding it. Since there are only finitely many basic feasible vectors 
and at least one of them must be optimal, in concept we could try 
them all and take the one that yields the largest value of </>£. It is not 
easy, however, to find even one basic feasible vector, or even one feasible 
vector, and for problems with large numbers of variables there would still 
be an enormous number of vectors to test. It is convenient to divide the 
process of finding a solution into two parts : to find a basic feasible vector 
and to find an optimum vector when a basic feasible vector is known. We 
take up the second of these problems first because it is easier to describe 
the simplex method for it. It is then easy to modify the simplex method 
to handle the first problem. 

Now, suppose that 8 = {/? x , . . . , /5 TO } is a basis of V where & = ^(a;) 
and {a 1? . . . , a TO } is a feasible subset of A. Let /? = 2^i b$i- Then £ = 
2^=i ^i a » ^ s t ne corresponding basic feasible vector and </>£ = 2*U C A- 
Suppose that a new coordinate system in V is chosen in which only one 
basis element is replaced with a new one. Let /? r = a(a r ) be replaced by 



3 | Linear Programming 245 

ft k = o(oL k ). Since a is represented by A = [a i} ] with respect to the bases 

A and B we have ft k = 2™ i ^aA- Since a rk #0 we can solve for ft r and 

obtain 

1 / m \ 

Pr = —\P*-?.aM- (3-3) 

i ±r 

Then 

TO h W / tf \ 

P = I bA = -P k + l(b i -b r -Aft,. (3.4) 

i=i a rk i=i \ a rk J 



If we let 



a* 



b k = ^ and b' i = b i -b T -^ t (3.5) 

u rk u rk 

then another solution to the equation cr(|) = ft is 

m 

£' = &X + 1 ^. (3.6) 



i=X 

Notice that although ft remains fixed and only its coordinates change, each 
particular choice of a basis leads to a different choice for the solution of the 
equation cr(£) = ft. 

We now wish to impose conditions so that £' will be feasible. Since b r > 
we must have a rk > and, since b, > either a ik < or bja^ > b r \a rk . 
This means r is an index for which a rk > and b r ja rk is the minimum of all 
bja ik for which a ik > 0. For the moment, suppose this is the case. Then £' 
is also a basic feasible vector. 

Now, 

TO 

#' = c k b k + 2 c t b' t 

i=\ 

h m / a 

= cA + Ic i (b i -b r ^ 
a rk i=x \ a rk 

™ b m a 

= 2 c i b i - c r b r + c k — - 2 Cib r — 
i=\ a rk i=i a rk 



= <!>£ + — \c k - 2 c i a ik) 

a rk \ i=i J 



= 0| + -^(c,-4). (3.7) 

a rk 



Thus 4>g > 4>£ if c k - ^Zi c i a ik > °> and H' > H if also b r > 0. 



246 Selected Applications of Linear Algebra | VI 

The simplex method specifies the choice of the basis element /5 r = a(of. r ) 
to be removed and the vector fi k = o(a k ) to be inserted into the new basis 
in the following manner: 

(1) Compute 2™i cfiu = d, , for j = 1, . . . , n. 

(2) Select an index k for which c k — d k > 0. 

(3) For that k select an index r for which a rk > and b r \a rk is the minimum 
of all bija ik for which a ik > 0. 

(4) Replace (5 r = <r(a r ) by fc = a(a. k ). 

(5) Express /? in terms of the new basis and determine f '. 

(6) Determine the new matrix A' representing a. 

There are two ways these replacement rules may fail to operate. There 
may be no index k for which c k — d k > 0, and if there is such a k, there may 
be no index r for which a rk > 0. Let us consider the second possibility 
first. Suppose there is an index k for which c k — d k > and a ik < for 
/ = 1, . . . , m. In other words, if this situation should occur, we choose to 
ignore any other choice of the index k for which the selection rules would 
operate. Then £ = a fc - J^ sl a ik ^ > and <r(£) = <r(a fc ) - JZi «« ^fo) = 
0. Also <j>^ = c k — 2™i c^,; > 0. Thus £ satisfies condition (1) of Theorem 
2.14. Since condition (2) cannot then be satisfied, the dual problem is in- 
feasible and the problem has no solution. 

Let us now assume the dual problem is feasible so that the selection pre- 
scribed in step (3) is always possible. With the new basic feasible vector 
£' obtained the replacement rules can be applied again, and again. Since 
there are only finitely many basic feasible vectors, this sequence of replace- 
ments must eventually terminate at a point where the selection prescribed 
in step (2) is not possible, or a finite set of basic feasible vectors may be 
obtained over and over without termination. 

It was pointed out above that <£|' > <f>£, and <(>£' > <f>£ if b r > 0. This 
means that if b r > at any replacement, then we can never return to that 
particular feasible vector. There are finitely many subspaces of V spanned 
by m — 1 or fewer vectors in o"(A). If /3 lies in one of these subspaces, the 
linear programming problem is said to be degenerate. If the problem is not 
degenerate, then no basic feasible vector can be expressed in terms of m — 1 
or fewer vectors in A. Under these conditions, for each basic feasible vector 
every b t > 0. Thus, if the problem is not degenerate, infinite repetition is not 
possible and the replacements must terminate. 

Unfortunately, many practical problems are degenerate. A special 
replacement procedure can be devised which makes infinite repetition 
impossible. However, a large amount of experience indicates that it is very 
difficult to devise a problem for which the replacement procedure given above 
will not terminate. 



3 | Linear Programming 247 

A. 

Now, suppose Cj — dj < for j = 1 , . . . , n. Let 6 = {ip 1} . . . , y> m } 
be the basis in V dual to 6, and consider ip = 2™i c iWi- Then a(y>) = 

L* i '<*(?<) = l£i *<(Z"-i ««&} = 2"-i (SZi ^ = 2F=x 4& > SLi 

c^- = <£. Thus \p is feasible for the canonical dual linear programming 
problem. But with this tp we have yj/5 = (2™ i c<Vi}{2j* i £*&} = 2™i C A- 

and ^ = {2?-i^iKL=i^^} = 5ii c A- This shows that # = <££• 
Since both | and y> are feasible, this means that both are optimal. Thus 

optimal solutions to both the primal and dual problems are obtained when 

the replacement procedure prescribed by the simplex method terminates. 

It is easy to formulate the steps of the simplex method into an effective 

computational procedure. First, we shall establish formulas by which we 

can compute the components of the new matrix A' representing a. 

m 

<*(*i) = 2 a ifii 

m , to \ 

i=i a rk \ i=i / 

= ^& + i(««--'«*W (3.8) 

a rk i=i\ a rk } 



Thus, 



a k j — , a tj — a^ a ik . \-j") 

a r1f a t 



l rk u rk 

It turns out to be more convenient to compute the new d'. — c 5 directly 



Cj - d\ = - 2 cfi't, - c k a' kj + C; 
i=i 

= - 2 c i\ a u - — a ik) - c k — + C, 
i=i \ a rk J a rk 

= ( c ,-rf,)- — (c k -d k ). (3.10) 

a rk 

For immediate comparison, we rewrite formulas (3.5) and (3.7), 

K = ^, b' { = b, - -^ a ik , (3.5) 

a rk a rk 

<f>!-' = <j>!; + ^-{c k -d k ). (3.7) 

a T k 



248 



Selected Applications of Linear Algebra | VI 



The similarity between formulas (3.5), (3.7), (3.9), and (3.10) suggests 
that simple rules can be devised to include them as special cases. It is con- 
venient to write all the relevant numbers in an array of the following form : 



c t 



(3.11) 



c k ~~ "k 



c, — d, 






The array within the rectangle is the augmented matrix of the system of 
equations AX = B. The first column in front of the rectangle gives the 
identity of the basis element in the feasible subset of A, and the second 
column contains the corresponding values of c t . These are used to compute 
the values of d^j = 1,...,«) and <f>£ below the rectangle. The row 
[• • ■ c k • • • Cj • - •] is placed above the rectangle to facilitate computa- 
tion of the (c f — dj)(j = 1 , . . . , n). Since this top row does not change when 
the basis is changed, it is usually placed at the top of a page of work for 
reference and not carried along with the rest of the work. This array has 
become known as a tableau. 

The selection rules of the simplex method and the formulas (3.5), (3.7), 
(3.9), and (3.10) can now be formalized as follows: 

(1) Select an index k for which c k — d k > 0. 

(2) For that k select an index r for which a rk > and b r \a rk is the minimum 
of all bja^ for which a rk > 0. 

(3) Divide row r within the rectangle by a rk , relabel this row "row k" 
and replace c r by c k . 

(4) Multiply the new row k by a ik and subtract the result from row /. 

Similarly, multiply row k by (c fc — d k ) and subtract from the bottom row 
outside the rectangle. 



Similar rules do not apply to the row [• • • d k 



dj • • •]. Once the 



3 | Linear Programming 249 

bottom row has been computed this row can be omitted from subsequent 
tableaux, or computed independently each time as a check on the accuracy 
of the computation. The operation described in steps (3) and (4) is known 
as the pivot operation, and the element a rk is called the pivot element. In the 
tableau above the pivot element has been encircled, a practice that has been 
found helpful in keeping the many steps of the pivot operation in order. 

The elements c 3 - — dj{j = 1 , . . . , ri) appearing in the last row are called 
indicators. The simplex method terminates when all indicators are <0. 

Suppose a solution has been obtained in which 8' = {fi[, . . . , ft^} is the 
final basis of V obtained. Let 8' = {y)[, . . . , y)' m } be the corresponding dual 
basis. As we have seen, an optimum vector for the dual problem is obtained 
by setting y) = JjLi c«*)Y>i where i{k) is the index of the element of A mapped 
onto fl' ki that is, <r(<x i(fc) ) — fi' k . By definition of the matrix A' = [a' i0 ] repre- 
senting a with respect to the bases A and 8', we have (3, = a(a.,) = 2k=i a i(k),jftk' 
Thus, the elements in the first m columns of A' are the elements of the matrix 
of transition from the basis 8' to the basis 8. This means that xp' k = 



to m / m \ 

W=J, c i(k)W'k=1 c i(k)( 2X*uVj) 
fc=l fc=l \j=l / 

m i m \ 

= 2, I Z, C Hk) a i(k)J IVj 
j=l\k=l ] 



= Id' j y j . (3.12) 

This means that D' = [d[ • ■ • d' m ] is the representation of the optimal 
vector for the solution to the dual problem in the original coordinate system. 
All this discussion has been based on the premise that we can start with 
a known basic feasible vector. However, it is not a trivial matter to obtain 
even one feasible vector for most problems. The z'th equation in AX = B is 

n 

^a ij x j = b f . (3.13) 

i=i 

Since we can multiply both sides of this equation by — 1 if necessary, there 
is no loss of generality to assume that each b t > 0. We then replace equation 
(3.13) by 

SciijXj + Vi = b t . (3.14) 

It is very easy to obtain a basic feasible solution of the corresponding system 
of linear equations ; takea?! = ••■=#„ = OandUj = b { . We then construct 
a new objective function 

2c,*,-M2> <f (3.15) 



250 



Selected Applications of Linear Algebra | VI 



where M is taken as a number very much larger than any number to be 
considered; that is, so large that this new objective function cannot be 
maximized unless v y = • • • = v m = 0. The natural working of the simplex 
method will soon bring this about if the original problem is feasible. At 
this point the columns of the tableaux associated with the newly introduced 
variables could be dropped from further consideration, since a basic feasible 
solution to the original problem will be obtained at this point. However, it 
is better to retain these columns since they will provide the matrix of tran- 
sition by which the coordinates of the optimum vector for the dual problem 
can be computed. This provides the best check on the accuracy of the com- 
putation since the optimality of the proposed solution can be tested un- 
equivocally by using Theorem 3.4 or comparing the values of </>! and tp(3. 

BIBLIOGRAPHICAL NOTES 

Because of the importance of linear programming in economics and industrial engineering 
books and articles on the subject are very numerous. Most are filled with bewildering, 
tedious numerical calculations which add almost nothing to understanding and make 
the subject look difficult. Linear programming is not difficult, but it is subtle and requires 
clarity of exposition. D. Gale, The Theory of Linear Economic Models, is particularly 
recommended for its clarity and interesting, though simple, examples. 



EXERCISES 

1. Formulate the standard primal and dual linear programming problems in 
matric form. 

2. Formulate the canonical primal and dual linear programming problems in 
matric form. 

3. Let A = [a^] be an m x n matrix. Let A 1X be the matrix formed from the 
first r rows and first s columns of A ; let A 12 be the matrix formed from the first r 
rows and last n — s columns of A ; let A 21 be the matrix formed from the last m — r 
rows and first s columns of A; and let A 22 be the matrix formed from the last 
m — r rows and last n — s columns of A. We then write 



A = 



A ti 



We say that we have partioned A into the designated submatrices. Using this 
notation, show that the matrix equation AX = B is equivalent to the matrix inequal- 
ity 



" A~ 


X < 


' B~ 


Y~ a \ 




\_-BJ 



4. Use Exercise 3 to show the equivalence of the standard primal linear pro- 
gramming problem and the canonical primal linear programming problem. 



3 | Linear Programming 251 

5. Using the notation of Exercise 3, show that the dual canonical linear pro- 
gramming problem is to minimize 

(Z l - Z 2 )B = [Z x Z 2 ] 



-B 



subject to the constraint 



[Zi Z 2 ] 



[-3* 



and [Z 1 Z 2 ] > 0. Show that if we let Y = Z x — Z 2 , the dual canonical linear 
programming problem is to minimize YB subject to the constraint YA ^ C without 
the condition Y > 0. 

6. Let A, B, and C be the matrices given in the standard maximum linear 
programming problem. Let F be the smallest field containing all the elements 
appearing in A, B, and C. Show that if the problem has an optimal solution, the 
simplex method gives an optimal vector all of whose components are in F, and 
that the maximum value of CX is in F. 

7. How should the simplex method be modified to handle the canonical minimum 
linear programming problem in the form: minimize CX subject to the constraints 
AX = BandX>01 

8. Find (x lt x 2 ) > which maximizes 5x x + 2x 2 subject to the conditions 

2#i + x 2 < 6 
4»! + x 2 <; 10 
— x ± + x 2 < 3. 

9. Find \y x y 2 t/ 3 ] > which minimize 6y x + \0y 2 + 3y 3 subject to the 
conditions 

Vi + Vi + Vs > 2 
2«/i + Ay 2 -y 3 >5. 

(In this exercise take advantage of the fact that this is the problem dual to the 
problem in the previous exercise, which has already been worked.) 

10. Sometimes it is easier to apply the simplex method to the dual problem 
than it is to apply the simplex method to the given primal problem. Solve the 
problem in Exercise 9 by applying the simplex method to it directly. Use this work 
to find a solution to the problem in Exercise 8. 

11. Find (x x , x 2 ) > which maximizes x x + 2x 2 subject to the conditions 

—x x + x 2 < 3 
2x t + x 2 < 11. 



12. Draw the lines 



"~~ ZJJO-* ~ f— **-'2 ~ ~ 

2x x +a; 2 = ll 



252 Selected Applications of Linear Algebra | VI 

in the x x , ;z 2 -plane. Notice that these lines are the extremes of the conditions given 
in Exercise 11. Locate the set of points which satisfy all the inequalities of Exercise 
11 and the condition (x x , x 2 ) > 0. The corresponding canonical problem involves 
the linear conditions 

— J*0C-t ~\~ X o "t~ X& — £ 

— 3?-i "T" *^o ~ ' 4 == 

X-* -\- Xn "T" «^5 == / 

2x x + x 2 + x 6 = 1 1 . 

The first feasible solution is (0, 0, 2, 3, 7, 11) and this corresponds to the point 
(0, 0) in the x x , x 2 -plane. In solving the problem of Exercise 11, a sequence of 
feasible solutions is obtained. Plot the corresponding points in the x lf x 2 -plane. 

13. Show that the geometric set of points satisfying all the linear constraints 
of a standard linear programming is a convex set. Let this set be denoted by C. 
A vector in C is called an extreme vector if it is not a convex linear combination 
of other vectors in C. Show that if <f> is a linear functional and P is a convex linear 
combination of the vectors {a 1? . . . , a r }, then <f>(p) < max {^(a,)}. Show that if p 
is not an extreme vector in C, then either (/>(!) does not take on its maximum value 
at p in C, or there are other vectors in C at which <£(£) takes on its maximum value. 
Show that if C is closed and has only a finite number of extreme vectors, then the 
maximum value of <£(^) occurs at an extreme vector of C. 

14. Theorem 3.2 provides an easily applied test for optimality. Let X = (x lt . . . , 
x n ) be feasible for the standard primal problem and let Y = [y x - ■ ■ y m ] be feasible 
for the standard dual problem. Show that X and Y are both optimal if and only 
if x 3 - > implies ^™ i Vi a a = c i and Vi > ° implies 2? =1 a ii x j = b t . 

15. Consider the problem of maximizing 2x x + x 2 subject to the conditions 

X-t ~~ £Xn ^^ J 

x x < 9 

2r x + x 2 < 20 

x x + 3x 2 < 30 

Consider X = (8, 4), Y = [0 1 0]. Test both for feasibility for the 
primal and dual problems, and test for optimality. 

16. Show how to use the simplex method to find a non-negative solution of 
AX — B. This is also equivalent to the problem of finding a feasible solution for 
a canonical linear programming problem. (Hint: Take Z = (z x , . . . , z m ), F = 
[1 1 • • • 1], and consider the problem of minimizing FZ subject to AX + Z = B. 
What is the resulting necessary and sufficient condition for the existence of a solution 
to the original problem?) 



4 | Applications to Communication Theory 253 

17. Apply the simplex method to the problem of finding a non-negative solution 
of 

6x 1 + 3x 2 — 4r 3 — 9x 4 — lx h — 5« 6 = 

-5^ - 8.r 2 + 8x 3 + 2x A - 2x 5 + 5x 6 = 

—1x x + 6x 2 — 5x 3 + 8a; 4 + 8.z 5 + x 6 = 

x x + x 2 + x 3 + x i + x 5 + x 6 = 1. 

This is equivalent to Exercise 5 of Section 1. 

4 I Applications to Communication Theory 

This section requires no more linear algebra than the concepts of a basis 
and the change of basis. The material in the first four sections of Chapter I 
and the first four sections of Chapter II is sufficient. It is also necessary that 
the reader be familiar with the formal elementary properties of Fourier series. 

Communication theory is concerned largely with signals which are uncer- 
tain, uncertain to be transmitted and uncertain to be received. Therefore, 
a large part of the theory is based on probability theory. However, there are 
some important concepts in the theory which are purely of a vector space 
nature. One is the sampling theorem, which says that in a certain class of 
signals a particular signal is completely determined by its values (samples) 
at an equally spaced set of times extending forever. 

Although it is usually not stated explicitly, the set of functions considered 
as signals form a vector space over the real numbers; that is, if/(0 and g(t) 
are signals, then (f + g)(t) = /(0 + g(t) is a signal and (af)(t) = af(t), 
where a is a real number, is also a signal. Usually the vector space of signals 
is infinite dimensional so that while many of the concepts and theorems 
developed in this book apply, there are also many that do not. In many 
cases the appropriate tool is the theory of Fourier integrals. In order to bring 
the topic within the context of this book, we assume that the signals persist 
for only a finite interval of time and that there is a bound for the highest 
frequency that will be encountered. If the time interval is of length 1 , this 
assumption has the implication that each signal /(/) can be represented as a 
finite series of the form 

N N 

f(t) = £a + 2 <>* cos 27Tkt + 2 h sin 2irkt . (4. 1) 

it=i fc=i 

Formula (4.1) is in fact just a precise formulation of the vague statement 
that the highest frequency to be encountered is bounded. Since the co- 
efficients can be taken to be arbitrary real numbers, the set of signals under 
consideration forms a real vector space V of dimension 2N + 1. We show 
that/(0 is determined by its values at IN + 1 points equally spaced in time. 
This statement is known as the finite sampling theorem. 



254 Selected Applications of Linear Algebra | VI 

The classical infinite sampling theorem from communication theory 
requires an assumption analogous to the assumption that the highest fre- 
quencies are bounded. Only the assumption that the signal persists for a finite 
interval of time is relaxed. In any practical problem some bound can be 
placed on the duration of the family of signals under consideration. Thus, the 
restriction on the length of the time interval does not alter the significance 
or spirit of the theorem in any way. 

Consider the function 

w(t) = ( 1 + 2 f cos 2nk t) e V. (4.2) 

N 

sin nt + ^ 2 cos 2nkt sin nt 
y>(t) = *=1 

(2N + 1) sin nt 

N 
sin nt + 2 sm (2nkt + nt) — sin {2nkt — irt) 



(2N + 1) sin nt 

= sin (2N + \)irt „ 3) 

(2N + 1) sin nt 

From (4.2) we see that ^(0) = 1 , and from (4.3) we see that y)(jj2N + 1) = 
for < \j | < N. 

Consider the functions 



V*(0 



= J t £ — \ for k = -N,-N + 1,...,N. (4.4) 

V 2N + 1/ 

These 2N + 1 functions are all members of V. Furthermore, for tj = 
jl(2N+ 1) we see that y>,(f,) = 1 while y> fc (^) = for k j±j. Thus, the 
2N + 1 functions obtained are linearly independent. Since Vis of dimension 
2N + 1, it follows that the set {ip k (t) \ k = — N, . . . , N} is a basis of V. 
These functions are called the sampling functions. 

If f(t) is any element of V it can be written in the form 

fit) = I d k y> k {t). (4.5) 

However, 

f(h) = I d&A,) = d it (4.6) 

or 

/(0= I /(QV,(0- (4-7) 



4 | Applications to Communication Theory 255 

Thus, the coordinates of /(f) with respect to the basis {y> k (t)} are (f(t_ N ), 
. . . , f(t N )), and we see that these samples are sufficient to determine /(f). 
It is of some interest to express the elements of the basis {\, cos 2irt, . . . , 
sin lirNt) in terms of the basis {ip k (t)}. 

1 N 

2 k=-N 

N 

cos 2vjt = 2 cos 27T JhV>k(t) ( 4 - 8 ) 

iV 

sin 2tt/7 = ^ sin 2ir/7 ifc v't(0- 

Jfc=-iV 

Expressing the elements of the basis (v^CO) i n terms of the basis {%, cos 27rf , 
. . . , sin lirNt) is but a matter of the definition of the y> k (t): 



N l k \1 
1 + 2Tcos2ir/|t — ) 



2N + 1 

= (l + 2 Y cos 2irjk cos 2irjt + 2V sin 2nJk sin 2tt/A 

2iV+l\ i~i 2N + 1 sti 2N+1 J 

(4.9) 

With this interpretation, formula (4.1) is a representation of /(f) in one 

coordinate system and (4.7) is a representation of /(f) in another. To 

express the coefficients in (4.1) in terms of the coefficients in (4.7) is but a 

change of coordinates. Thus, we have 

"> = ^TTT ^ /(<*) C OS -^T = 7T^— 2/0*) COS 2TTJt k 

2N + 1 *=-# 2iV + 1 2AT + 1 k=-N 

2 » 2„ik 2 N (4 - 10) 

b t = T f(t k ) sin Z7rJfc = 2 /(f fc ) sin 27TJt k . 

\ 2N+U=-iv 2W+1 2N+1*£n 

There are several ways to look at formulas (4.10). Those familiar with 
the theory of Fourier series will see the a, and bj as Fourier coefficients with 
formulas (4.10) using finite sums instead of integrals. Those familiar 
with probability theory will see the a, as covariance coefficients between the 
samples of /(f) and the samples of cos 2-njt at times t k . And we have just 
viewed them as formulas for a change of coordinates. 

If the time interval had been of length T instead of 1 , the series correspond- 
ing to (4.1) would be of the form 

fit) = ia + 2>* cos -£- t + £ b k sin -^- t. (4.11) 

fc=l T k=l 1 



256 Selected Applications of Linear Algebra | VI 

The vector space would still be of dimension 2N + 1 and we would need 
2N + 1 samples spread equally over an interval of length T, or (27V + l)/r 
samples per unit time. Since N/T = W is the highest frequency present in 
the series (4.11), we see that for large intervals of time approximately 2W 
samples per unit time are required to determine the signal. The infinite 
sampling theorem, referred to at the beginning of this section, says that if 
W is the highest frequency present, then 2W samples per second suffice 
to determine the signal. The spirit of the finite sampling theorem is in keeping 
with the spirit of the infinite sampling theorem, but the finite sampling 
theorem has the practical advantage of providing effective formulas for 
determining the function f(t) and the Fourier coefficients from the samples. 

BIBLIOGRAPHY NOTES 

For a statement and proof of the infinite sampling theorem see P. M. Woodward, 
Probability and Information Theory, with Applications to Radar. 

EXERCISES 

1. Show that y>(t r - t s ) = yl ^ M = <5 rs for -N < r, s £ N. 

2. Show that if /(f) can be represented in the form of (4.1), then 

r l A 
a k = 2 f{t) cos l-nkt dt, k = 0, 1 , . . . , N, 

C A 
b k = 2 /(/) sin lirkt dt, k = 1, . . . , N. 

r A i 

3. Show that y(t) dt = 



2N + 1 



C A 1 

4. Show that y k (t) dt = 

J-V2 



in + r 

5. Show that if /(f) can be represented in the form (4.7), then 



£ 



l A N 1 



This is a formula for expressing an integral as a finite sum. Such a formula is 
called a mechanical quadrature. Such formulas are characteristic of the theory of 
orthogonal functions. 

6. Show that 

N 1 

2 OM _l_ 1 COS 27Trt * = a f*' 

k=-N zyv "+" l 



4 | Applications to Communication Theory 257 

and 



7. Show that 



and 



8. Show that 



9. Show that 



N l 

k £ N 2N + l 



N 1 

2 i>r , 1 C0S 27rr ^ ~ '*) = S rk 

fc=-A 7 Liy + x 



iV 1 

2 ^m sin 27Tr C - '*) = °- 

k=-N Zly + V 



N 

2 v*(0 = l. 

fc=-JV 

fc=-iV 



10. If /(f) and^(f) are integrable over the interval [— |, £], let 

(/.*)= *f '/WO*. 

J -14 

Show that if/(0 and^(f) are elements of V, then {f,g) defines an inner product 
in V. Show that I— = , cos 27rf, . . . , sin 2nNt\ is an orthonormal set. 

11. Show that if /(f) can be represented in the form (4.1), then 

f\4 a J N N 

Show that this is Parseval's equation of Chapter V. 



2 



12. Show that 

1 

cos 2nrt wit) dt = ^^ 

13. Show that 



r 



rvi i 

cos 27rr/ v»fc(0 dt = ^ r , t cos 2^. 
J-14 



2N + 1 



f 1 ^ 1 

sin 2-nrt y k {t) dt = ^^ , 1 sin 2irrt k . 



14. Show that 

r l A 

I sin 7-rrrt vu(t\ dt = 

2N + 1 

15. Show that 

1 

r (/)y,(f ) dt = 



r l A 



16. Using the inner product defined in Exercise 10 show that {y k (t) | k = 

— TV, . . . , JV} is an orthonormal set. 



1 



258 Selected Applications of Linear Algebra | VI 

17. Show that if /(f) can be represented in the form (4.7), then 

Show that this is Parseval's equation of Chapter V. 

18. Show that if /(f) can be represented in the form (4.7), then 

/W<0* -jjjfVt/CJ. 

Show how the formulas in Exercises 12, 13, 14, and 15 can be treated as special 
cases of this formula. 

19. Let /(f) be any function integrable on [— |, £]. Define 

d k = (2N + \)( ' f(t) Vk (t)dt. 
J-\4 

Then/ Ar (0 = ^i_ N d k y> k (t) e V. Show that 

r l A rv* 

/X0W(0 dt = f N it)tp r (t) dt, r=-N,..., N. 

J-14 J-H 

Show that if g(t) e V, then 

[* f{t)g{t)dt = [ f N {t)g(t)dt. 

J- l A J-M 

20. Show that if/ v (0 is defined as in Exercise 19, then 



Show that 



r'A r l A r l A 

[/(» " fy(0f dt = /(f) 2 dt - f N {tf dt. 

J- l A J-Vz J-\4 

r l A ■ C l A 

f N {tfdt<\ f{tfdt. 



M J- X A 

21. Let^(f) be any function in V. Show that 



f * [fit) -giOfdt - [ ' [/(f) -f x (t)fdt = [ A [f N (t) -g(t)fdt. 
J-14 J-V2 J-\A 



- X A 
Show that 

\ ' 1/(0 -fsiOfdt < f 2 [/(f) -g{t)fdt. 

J-V2 J- l A 

Because of this inequality we say that, of all functions in V,f^(0 is the best approxi- 
mation of /(f) in the mean; that is, it is the approximation which minimizes the 
integrated square error. f N (t) is the function closest to/(f) in the metric defined by 
the inner product of Exercise 10. 



5 | Vector Calculus 259 

22. Again, let/(/) be a function integrable on [— \, \\ Define 

Let c > be given. Show that there is an N(e) such that if N > N(e), then 
r x A r l A r l A 

f{t)dt-e<\ F N (t)dt<\ f(t)dt + e. 

J-14 J-14 J-Vz 

5 I Vector Calculus 

We assume that the reader is familiar with elementary calculus, which 
considers scalar-valued functions of a scalar variable and scalar-valued 
functions of several scalar variables. Functions of several variables can also 
be taken up from a vector point of view in which a set of scalar variables is 
replaced by a vector. The background in linear algebra required is covered 
in this text in the first three sections of Chapter IV and Sections 1 and 3 of 
Chapter V. 

In this section we consider vector-valued functions of a vector variable. 
We assume that the reader is acquainted in a different setting with most of 
the topics mentioned in this section. We emphasize the algebraic aspects of 
these topics and state or prove only those assertions of an analytic nature 
that are intimately linked with the algebraic structure. 

In this section we assume that V is a vector space of finite dimension n over 
the real numbers or the complex numbers, and that a positive definite inner 
product is defined in V. We shall write (a, /?) for the inner product of a, 
ft g V. In Section 3 of Chapter V we showed that for any linear functional 
<f> e V, there exists a unique r) e V such that <f>(p) = (r), (3) for all ft £ V. We 
showed there that the mapping of <f> onto r)(<f>) = r\ defined in this way is 
one-to-one and onto. 

We can use this mapping to define an inner product in V. Thus, we define 



(<£, y) = (^), V (y)) = (itfvO, riffl). (5.1) 

The conjugate appears in this definition because the mapping r\ is conjugate 
linear and we require that the inner product be linear in the second variable. 
It is not difficult to show that this does, in fact, define an inner product in V. 
For the norm in V we have 

UW* = (<£, +) = OM), n (4>)) = HOW. (5.2) 

From Schwarz's inequality we obtain 

IW)I = I0?(#, P)\ < I1 17(^)11 • WW = H\\ • \Wl (5.3) 



260 Selected Applications of Linear Algebra | VI 

Theorem 5.1. \\<f>\\ is the smallest value of M for which \<f>(p)\ < M \\p\\ for 
all P e V. 

proof. (5.3) shows that 0(0)1 < M \\p\\ holds for all if M = \\<f>\\. Let 
P = titf). Then \cf>(P)\ = \(rj(<f>), fl\ = (P, P) = \\pV = H\\ ' WPW- Thus, 
the inequality \<j>{p)\ < M \\p\\ cannot hold for all values of p if M < \\<f>\\. U 

Note. Although it was not pointed out explicitly, we have also shown that 
for each <f> such a smallest value of M exists. When any value of M exists such 
that \(f>(p)\ < M \\p\\ for all p, we say that (f> is bounded. Therefore, we have 
shown that every linear functional is bounded. In infinite dimensional vector 
spaces there may be linear functionals that are not bounded. 

Iff is any function mapping U into V, we define 

lim /(|) = a. (5.4) 

to be equivalent to the following statement: "For any e > 0, there is a <5 > 
such that HI — | || < d implies ||/(!) — a|| < e." The function /is said 
to be continuous at | if 

lim /(|) = /(| ). (5.5) 

These definitions are the usual definitions from elementary calculus with 
the interpretations of the words extended to the terminology of vector spaces. 
These definitions could be given in other equivalent forms, but those given 
will suffice for our purposes. 

Theorem 5.2. Every (bounded) linear functional in V is continuous on all 
ofV. 

proof. Let M be any positive real number such that \<f>(P)\ < M \\p\\ holds 
for all p e V. Then, for the given e > 0, it suffices (uniformly) to take d — 
ejM. For any p o we have 

W) - <t>{Po)\ = I W - AOI <M\\P- P \\ < e (5.6) 

whenever \\p — P \\ < <5. □ 

Theorem 5.3. Let A = {o^, . . . , oc n } be any basis in V. There exist positive 
real numbers C and D, depending only on the inner product and the chosen 
basis, such that for any | = 2j W =i x i*i G ^ we nave 

n n 

C2k|<||!|| <£>2K|- (5.7) 



i=l 



proof. By the triangle inequality 



1111 = 



2>* a i 



<Ill^n =2teHki 



5 | Vector Calculus 261 

Let D = max {||aj}. Then 

11*11 <l\Xi\' D = D l\*i\- 

On the other hand, let A = {(f> x , . . . , <f> n } be the dual basis of A. Then 
W = \U0\ < \\<t>i\\ ' Ml Taking C 1 = 2Li HA > 0. we have 

n n 

2kl<2ll&IHI£ll =c~ 1 Mhn 

i=l i=l 

The interesting and significant thing about Theorem 5.3 is it implies that 
even though the limit was defined in terms of a given norm the resulting 
limit is independent of the particular norm used, provided it is derived from a 
positive definite inner product. The inequalities in (5.7) say that a vector is 
small if and only if its coordinates are small, and a bound on the size of the 
vector is expressed in terms of the ordinary absolute values of the coordinates. 

Let | be a vector-valued function of a scalar variable /. We write | = £(f ) 
to indicate the dependence of | on t. A useful picture of this concept is to 
think of | as a position vector, a vector with its tail at the origin and its head 
locating a point or position in space. As t varies, the head and the point it 
determines moves. The picture we wish to have in mind is that of £ tracing out 
a curve as t varies over an interval (in the real case). 

If the limit 

lim ^ + h) ~ m - * (5.8) 

ft->o h dt 

exists, the vector valued function I is said to be differentiable at /. This 
limit is usually called the derivative, but we wish to give this name to a diff- 
erent concept. At this moment the reasons for making the proposed dis- 
tinction would seem artificial and hard to explain. Therefore, we shall 
re-examine this idea after we consider functions of vector variables. 

,. !(* + h)- 1(0 . 
Since f (t + h) — |(r) is a vector and h is a scalar, hm is a 

vector. It is interpreted as a vector tangent to the curve at the point |(f). 

Now, let/ be a scalar valued function defined on V. Often, /is not defined 
on all of V, but only on some subdomain. We do not wish to become in- 
volved in such questions. We assume that whenever we refer to the behavior 
of/ at some £ e V,/(|) is also defined for all points in a sphere around £ 
of radius sufficiently generous to include all other vectors under discussion. 

Let | be an arbitrary vector in V, which we take to be fixed for the moment. 
Let r) be any other vector in V. For the given £ and r\, we assume the ex- 

pression m + *,) - /«.) 



h 



(5.9) 



262 Selected Applications of Linear Algebra | VI 

is defined for all h 5^ in an interval around 0. If 

li„ /g« + *?>-/<*■> ./■({.,,) (5.10) 

h->o h 

exists for each rj, the function /is said to be differentiable at | . It is con- 
tinuously dijferentiable at £ if/ is differentiable in a neighborhood around £„ 
and /'(|, w) is continuous at | for each r\\ that is, lim/'(|, rj) =/'(| , ??). 

We wish to show that/'(|, 77) is a linear function of r\. However, in order 
to do this it is necessary that/'(£, rj) satisfy some analytic conditions. The 
following theorems lead to establishing conditions sufficient to make this 
conclusion. 

Theorem 5.4 {Mean value theorem). Assume thatf'(£ + hrj, rj) exists for all 
h, < h < 1, and that /(£ + hrj) is continuous for all h, < h < 1. Then 
there exists a real number 6, < Q < 1, such that 

/(£ + V) -/(£) =/"(£ + Ol* V)- (5-11) 

proof. Let g(h) =/(| + to?) for f and ?? fixed. Then g(/i) is a real 
valued function of a real variable and 

g (ft) = hm — 

Aft->0 Art 

, . /(I + fo7+A/»y)-/(g + f»7) 
= hm ^ — 

Aft-o Art 

= /'(! + H»7) < 5 ' 12) 

exists by assumption for < rt < 1. By the mean value theorem for g(h) 
we have 

g (l)-g(0) = g'(6), O<0<1, (5.13) 

or 

/(I + n) -/(« = /'(* + ^^- D 

Theorem 5.5. Iff'(£, rj) exists, then for aeF. /'(£, 0*7) ex/ste and 

fX£,ar,) = af'{S,rj). (5.14) 

proof. / (| a^) = hm 

^-►o rt 

,. f(£ + ahrj)-m 

= a hm 

a/j->o art 

= af'(5,rj) 

for a 5* 0, and/'(f , fl»?) = for a = 0. □ 



5 | Vector Calculus 263 

Lemma 5.6. Assume f'(£ Q , Vi) exists and thatf'(£, tj 2 ) exists in a neighbor- 
hood of £„• (/"/'(£> V2) is continuous at £ , thenf'(^ , rj x + rj 2 ) exists and 

/'do, Vi + %) = /'(&, Vi) + /'do, %)• (5-15) 

PROOF. 

/ do, % + Vt) = llm " 7 

h-*o n 

.. /do + fah + fa?,) - fdo + H) + /do + fah) -/do) 

= 11m — - — 

h-*0 h 

_ Bm /'(f. + hh + 9fch.feh) + nfi> %) 

by Theorem 5.4, 

= lim/'(lo + % + Ohrj 2 ,rj 2 ) + /'do »h) 
by Theorem 5.5, 

= /'do^ 2 )+/'do^i) 

by continuity at £ . D 

Theorem 5.7. Let A = {a 1? . . . , aj fee a fowis 0/ V over F. Assume 
/'(lo, oO exists for all i, and thatf'(i, a*) exists in a neighborhood of £ and is 
continuous at i for n — 1 of the elements of A. Thenf'(€ , rj) exists for all 
rj and is linear in rj. 

proof. Suppose /'(I, a^) exists in a neighborhood of | and is continuous 
for 1 = 2, 3, . . . , n. Let S k = <a 1? . . . , a fc >. Theorem 5.5 says that/'d , rj) 
is linear in rj for rj e S v 

By induction, assume /'do, v) is linear in 77 for rj e S k . Then by Theorem 
5.5, /'do, a k+1 ai k+1 ) exists in a neighborhood of £ c and is continuous at £ 
for all a fc+1 e F. By Lemma 5.6, /'d , V + 0*+i a *+i) = /'do, *?) + /'do» 
«*+i a *+i) =/'do, »?) + 0*+i/'do, «w-i). Since all vectors in S k+1 are of the 
form rj + tf fc+ ia fc+1 , /'do, ??) is linear for all rj e S k+1 . Finally, /'d , rj) is 
linear for all rj eS n = V.o 

Theorem 5.7 is usually applied under the assumption that/'d, rj) is con- 
tinuously differentiable in a neighborhood of | for all r) eV. This is certainly 
true if/'d, 0O is continuously differentiable in a neighborhood of £ for all 0^ 
in a basis of V. Under these conditions, /'(f , 77) is a scalar valued linear 
function of rj denned on V. The linear functional thus determined depends 
on £ and we denote it by df(!j). Thus df(t-) is a linear functional such that 
df(£)(rj) =/'d, rj) for all jjeV. #(£) is called the differential off at f. 



264 Selected Applications of Linear Algebra | VI 

If A = {a 1? . . . , aj is any basis of V, any vector £ eV can be represented 
in the form £ = 2* x i 0L i- Thus, any function of | also depends on the 
coordinates (z ± , . . . , x n ). To avoid introducing a new symbol for this 
function, we write 

/(I) =/(*i, •••,*«)• (5-16) 

Since df(tj) is a linear functional it can be expressed in terms of the dual 
basis A = {&,... , <£ J. The coordinates of df(fj) are easily computed by 
evaluating df(£) for the basis elements oq e A. We see that 

df(£K*i) = / (5. a<) = lim 



= lim 



h 

f(x lt . . . , x t + h, . . . , x n ) - f(x lt ...,x n ) 
h 



(5.17) 



Thus, 



dm = lf^ (5-18) 



a 



'#„• 



For any 

i 

df(m=f'(t,v) = I§ L yi- (5-i9) 

From (5.17), the assumption that /'(!, ??) is a continuous function of £ 
implies that the partial derivatives are continuous. Conversely, the con- 
tinuity of the partial derivatives implies that the conditions of Theorem 5.7 
are satisfied and, therefore, that /'(£, rj) is a continuous function of f. In 
either case, /'(I, rj) is a linear function of rj and formula (5.19) holds. 

Theorem 5.8. Iff'(£, rj) is a continuous function of | for all rj, then df{£) 
is a continuous function of £. 
proof. By formula (5.17), if /'(I, rj) is a continuous function of |, then 

-J- = f (£, a) is a continuous function of f . Because of formula (5.18) it 

a^ 7 v * y 

then follows that df{£) is a continuous function of £. D 

If ^ is a vector of unit length, /'(I, rj) is called the directional derivative of 
/(|) in the direction of rj. 



Since 

we see that 



5 | Vector Calculus 265 

Consider the function that has the value x t for each | = 2* x^. Denote 
this function by X t . Then 

dXlgfaj) = hm f 

ft-»o n 

— d a- 

dxm = &. 

It is more suggestive of traditional calculus to let x t denote the function X { , 
and to denote ^ by dx t . Then formula (5.18) takes the form 

df=2^-dx i . (5.18) 

i OX i 

Let us turn our attention for a moment to vector-valued functions of vector 
variables. Let U and V be vector spaces over the same field (either real or 
complex). Let U be of dimension n and V of dimension m. We assume that 
positive definite inner products are defined in both U and V. Let F be a 
function defined on U with values in V. For £ and r\ e U, F'(£, rj) is defined 
by the limit, ftm , N w . x 

f-tt,,)-lim ftf + * , '>- fffl , (5.20) 

if this limit exists. If F'(£, rj) exists, F is said to be differ entiable at |. F is 
continuously differentiable at | if Fis differentiable in a neighborhood around 
| and F'(£, rj) is continuous at £ for each rj; that is, HmF'(£, 17) = 

F'(£ , *?)• *^° 

In analogy to the derivative of a scalar valued function of a vector variable, 
we wish to show that under appropriate conditions F'(£, rj) is linear in rj. 

Theorem 5.9. If for each rj e U, F'(£, rj) is defined in a neighborhood of | 
and continuous at £ , then F'(£o> V) /5 linear in rj. 

proof. Let ip be a linear functional in V. Let/be defined by the equation 

/(£) = y{F(|)}. (5.21) 

Since ip is linear and continuous, 

_ lim jm+JstLium 

h-*0 [ h ) 

\h^0 h J 

= W {F'{i-,rj)}. (5.22) 



266 Selected Applications of Linear Algebra | VI 

Since ip is continuous and denned for all of V,/'(f , rj) is defined in a neighbor- 
hood of £ and continuous at | . By Theorem 5.7, /'(| , rj) is linear in rj. 
Thus, 

y>{F'(g , a 1 r] 1 + a 2 rj 2 )} = a x %p{F'(£ Q , ^)} + a 2 y){F'(£ , rj 2 )} 

= tpfaF'iZo, rjj + flaF'Clo. %)}• 

Since F'(£ , «i»h + «2%) - «i*"(£o» »7i) - «2*"(? > %) is annihilated by all 

A. 

^> e y, it must be 0; that is, 

F'($o, aiVx + «2^ 2 ) = a.F'i^o, ^) + a 2 F'(| , r) 2 ). D (5.23) 

For each |, the mapping of r\ e U onto F'(£, ??) e Vis a linear transforma- 
tion which we denote by F'(£). F'(|) is called the derivative of Fat £. 

It is of some interest to introduce bases in U and V and find the correspond- 
ing matrix representation of F'(g). Let A = {a lf . . . , a B } be a basis in U 
and 8 = {&, ... , pj be a basis in V. Let £ = ]£, x^ and F(£) = J, */,•(£)&• 

Let B = (vi, • • • , ^ TO } be the dual bais of B - Then VfcW) = &(£)• If 
F'(£) is represented by the matrix / = [a i} ], we have 

F'M*,) = I *iA ( 5 - 24 ) 



Then 



r F(g + h* t ) - F(l) 

y> fc F(£ + ftoc ,) - y k F(£) 
= hm 

= lim y ^ + h(Xj) ~ Vk ^ 

h->0 h 



dx j 
according to formula (5.17). Thus F'(£) is represented by the matrix 



(5.25) 



J(S) = 



' dy k 

dX; 



(5.26) 



7(1) is known as the Jacobian matrix of F'(£) with respect to the basis A and 8. 
The case where U = V — that is, where Fis a mapping of V into itself— is of 
special interest. Then F'(£) is a linear transformation of V into itself and the 
corresponding Jacobian matrix 7(|) is a square matrix. Since the trace of J(|) 
is invariant under similarity transformation, it depends on F'(£) alone and not 



5 | Vector Calculus 267 

on the matrix representation of F'(£). This trace is called the divergence 
of F at £. 

Tr(F'(l)) = divF(£) = i^. (5.27) 

1=1(7^ 

Let us re-examine the differentiation of functions of scalar variables, 
which at this point seem to be treated in a way essentially different from 
functions of vector variables. It would also be desirable if the treatment of 
such functions could be made to appear as a special case of functions of 
vector variables without distorting elementary calculus to fit the generaliza- 
tion. 

Let W be a 1-dimensional vector space and let {yj be a basis of W. We 
can identify the scalar t with the vector ty x , and consider |(f) as just a short- 
handed way of writing |(fy x ). In keeping with formula (5.10), we consider 

Um ««?, + hi) - Htn) _ {Xtyu v) (5 28) 

h->0 h 

Since W is 1-dimensional it will be sufficient to take the case r\ = y x . Then 

ZVyiXri) = ZVyi, ?i) = hm 1 

h->o n 

= lim «' + h) ~ «'> 

7i-»0 h 

= *i . (5.29) 

dt 

Thus dgjdt is the value of the linear transformation l'(/yi) applied to the 
basis vector y x . 

Theorem 5.10. Let F be a mapping of U into V. If F is linear, F'(£) = F 
for all feU. 

proof. F (!)w = F (I, ?y) = hm 

a->o n 

= lim F(^) 

= F(rj). □ (5.30) 

Finally, let us consider the differentiation of composite functions. Let F 
be a mapping of U into V, and G a mapping of V into W. Then GF = H 
is a mapping of U into W. 

Theorem 5.11. If F is linear and G is differentiate, then (GF)'Q) = 
G'(F(£))F. IfG is linear and F is differentiate, then (GF)'(|) = GF'(£). 



268 Selected Applications of Linear Algebra | VI 

proof. Assume F is linear and G is differentiable. Then 

GF(£ + hri) - GF(£) 



(GF)'(l)0?) = (GF)X£,V) = Mm 



ft-»o h 

G[F(|) + hF(rj)] - G[F(m 



= lim 

h-*0 h 

= G'[F(i),F(rj)] 
= G'(F(*))Ffo). 
Assume G is linear and F is differentiable. Then 

(GF)'(0(V) = W(f, 17) = lim GF( * + ^ 



(5.31) 



GFtf) 



— Ill 

h-> 


ii 




h 




= lim G 

ft-»0 


rF(i + ^) - 


- F{&] 


h 


_ 


= G 


lim 

_h->0 


F(| + fa?) - 


- F(m 


h 


_ 


= G[F'(£,rj)] 




= G[F'(0fo)] 




= [GFXmv)- □ 





(5.32) 

Theorem 5.12. Let F be a mapping of U into V, and G a mapping of V into 
W. Assume that Fis continuously differentiable at £ c and that G is continuously 
differentiable in a neighborhood of t = F(£ ). Then GF is continuously 
differentiable at | and (GF)'(| ) = G'(F(£ ))F'(g ). 

proof. For notational convenience let F(| + hr\) — F(f ) = co. Let y> 
be any linear functional in W. Then ipG is a scalar-valued function on U 
continuously differentiable in a neighborhood of t . Hence 

y>GF(f + hr[) - y>GF(£ ) = V>G(t + co) - y>G(r ) 

= (vG)'(t + 6a),co) 

for some real d, < 6 < 1. Now 

F(l + foj) - F(| ) 



lim — = lim 

7»-»0 /t 



and 



ft-+o n 

= F'{^,ri) 

lim (t + 0co) = r . 



5 | Vector Calculus 269 

Since G'(r, co) is continuous in t in a neighborhood of t and bounded linear 

in ft) ' v<Gm$M) = (wGF)'(h, n) 

— lim - (y>G)'(t + Oco, oS) 

ft-»0 /i 



= lim \pG' j t + 6oj, — \ 
a->o \ h ) 



= y)G'(r , F(£o, ??)) 
= yG'(F(fo)>n£o)fo)) 
= ^[G'(nio))F'(^o)](^). (5-33) 

Since (5.33) holds for all y> e W, we have 

(GF)'(lo)(^) = G'(F(| ))F'(lo)(^). □ (5-34) 

This gives a very reasonable interpretation of the chain rule for differentia- 
tion. The derivative of the composite function GF is the linear function 
obtained by taking the composite function of the derivatives of F and G. 

Notice, also, that by combining Theorem 5.10 with Theorem 5.12 we can 
see that Theorem 5.11 is a special case of Theorem 5.12. If F is linear 
(GF)'(i) = G'(F(£))F'(£) = G'(F(0)F If G is linear, (GF)'(|) = G'(F(£))F' 
(£) = GF'(I). 

Example 1. For f = xj x + z 2 £ 2 , let/(f) = x* + 3x 2 2 - 2x x x 2 . Then for 
V = 2/ifi + VoJ 2 we have 

/ (|, ^) = hm •> 

&->o n 

= 2^^ + 6x 2 ?/ 2 — 2x x y 2 — 2x 2 y x 
= (2x 1 — 2x 2 )y 1 + (6x 2 — 2x x )y 2 . 

A 

If {^> l5 <f> 2 } is the basis of R 2 dual to {£ l5 £ 2 }, then 

/'(flfo) = /'(£ i?) 

= [(2^ - 2^)0! + (6x 2 - 2x 1 )<f> 1 ](r ] ), 
and hence 

/'(I) = (2x x - 2s 2 )& + (6* 2 - 2^)^ 
dL , , 9L . 
d^ cto 2 

Example 2. For | = a^lx + z 2 | 2 + x^ z , letF(|) = sin x z £ x + a^a^la + 
(V + aj 3 *)| 8 . Then for rj = y^ + y 2 £ 2 + 2/ 2 £ 3 we have 

F'(£, fj) = y 2 cos ^Ix + {x 2 x z y x + ar^gi/a + x x x 2 y 3 )^ 2 + (2x 1 y 1 + 2a: 3 y 3 )|3 
= F'(0(V)- 



270 Selected Applications of Linear Algebra | VI 

We see that F'(g) is a linear transformation of R 3 into itself represented with 
respect for the basis {f 1; | 2 , f 3 } by the matrix 

cos x 2 

**/rt»^Q %As-t%Aj*y lAyl^VO 

In both of these examples rather conventional notation has been used. 
However, the interpretation of the context of these computations is quite 
different from the conventional interpretation. In Example 1 , /(I) should 
be regarded as a function mapping R 2 into R, and/'(£) should be regarded 
as a linear approximation of/at the point £. Thus/'(£) g Horn (R 2 , R) = R 2 . 
In Example 2, F(£) is a function of R 3 into R 3 and F'(£) is a linear approxi- 
mation of Fat the point f. Thus F'(£) e Horn (R 3 , R 3 ). 

6 I Spectral Decomposition of Linear Transformations 

Most of this section requires no more than the material through Section 7 
of Chapter III. However, familiarity with the Jordan normal form as 
developed in Section 8 of Chapter III is required for the last part of this 
section and very helpful for the first part. 

Let a be a linear transformation of an w-dimensional vector space V into 
itself. The set of eigenvalues of a is called the spectrum of a. Assume that 
V has a basis {a 1} . . . , a n } of eigenvectors of a. Let \ be the eigenvalue 
corresponding to a^. Let S { be the subspace spanned by a*, and let 7^ be the 
projection of V onto S t along S x © 
projections have the properties 



and 



TTfTj = for 



© s,_ x © s i+1 © • 


■®s n . 


These 
(6.1) 


i*j. 




(6.2) 



Any linear transformation a for which a 2 = a is said to be idempotent. 
If a and r are two linear transformations such that ot = ra = 0, we say 
they are orthogonal. Similar terminology is applied to matrices representing 
linear transformations with these properties. 
Every | e V can be written in the form 

£ = fi + ■ • ■ + £„. (6.3) 

where £ t e S^. Then tt^I) = £ f so that 

I = ir x (£) + • • • + 7r„(£) 
= (tt! + • • • + ttJ(I). (6.4) 



6 | Spectral Decomposition of Linear Transformations 271 

Since (6.4) holds for every £ e V we have 

1 = tt, + 7T 2 + ' ' ' + TT n - ( 6 -5) 

A formula like (6.5) in which the identity transformation is expressed as 
a sum of mutually orthogonal projections is called a resolution of the identity. 
From (6.1) and (6.2) it follows that (ttj + 7r 2 ) 2 = K + tt 2 ) so that a sum of 
projections orthogonal to each other is a projection. Conversely, it is some- 
times possible to express a projection as a sum of projections. If a projection 
cannot be expressed as a sum of non-zero projections, it is said to be irreduc- 
ible. Since the projections given are onto 1-dimensional subspaces, they are 
irreducible. If the projections appearing in a resolution of the identity are 
irreducible, the resolution is called irreducible or maximal. 

n 

Now, for £ = 2 li as in ( 6 - 3 ) we have 

\ 1=1 / i=l 

n 

= 1 ^£i 

1=1 

n 

= I WS) 

|(0. (6-6) 

Since (6.6) holds for every | £ V we have 

(7 = 2^. (6.7) 



= ( 2 V<)< 



i=l 



A representation of cr in the form of (6.7), where each A* is an eigenvalue of 
a and each 7T t is a projection, is called a spectral decomposition. If the 
eigenvalues are each of multiplicity 1, the decomposition is unique. If 
some of the eigenvalues are of higher multiplicity, the choice of projections 
is not unique, but the number of times each eigenvalue occurs in the decom- 
position is equal to its multiplicity and therefore unique. 

An advantage of a spectral decomposition like (6.7) is that because of 
(6.1) and (6.2) we have 

cr 2 = iV^, (6.8) 

i=l 

tf* = JU**<, (6-9) 



272 Selected Applications of Linear Algebra | VI 

and 

/(*) = 1KK>"< (6-10) 

for any polynomial/^) with coefficients in F. 

Given a matrix representing a linear transformation a, there are several 
effective methods for finding the matrices representing the projections 7r i5 
and, from them, the spectral decomposition. Any computational procedure 
which yields the eigenvectors of a must necessarily give the projections since 
with the eigenvectors as basis the projections have very simple representations. 
However, what is usually wanted is the representations of the projections in 
the original coordinate system. Let S, = (s 1:j , s 2j , . . . , s ni ) be the repre- 
sentation of a, in the original coordinate system. Since we have assumed there 
is a basis of eigenvectors, the matrix S = [.%] is non-singular. Let T = [t i:j ] 
be the inverse of S. Then 

• P k = [Silch,] (6.11) 

represents 7r fc , as can easily be checked. 

We give another method for finding the projections which does not require 
finding the eigenvectors first, although they will appear in the end result. We 
introduce this method because it is useful in situations where a basis of 
eigenvectors does not exist. We, however, assume that the characteristic 
polynomial factors into linear factors. Let {A l5 . . . , X p ) be the distinct 
eigenvalues of a, and let 

m(x) = (x - A x ) s i ■••(*- X v ) s » (6.12) 

be the minimum polynomial for a. Set 

K(x) = 7 Jn ^~ s .. (6.13) 

We wish to show now that there exist polynomials giCe), • • • , g p ( x ) such that 

1 = £i(*)fti(*) + • • • + g 9 {x)h p (x). (6.14) 

Consider the set of all possible non-zero polynomials that can be written 
in the form /? i (a;)A 1 (a;) + • • • + p p {x)h v {x) where the Pi(x) are polynomials 
(not all zero since the resulting expression must be non-zero). At least one 
polynomial, for example, h^x), can be written in this form. Hence, there is a 
non-zero polynomial of lowest degree that can be written in this form. Let 
d(x) be such a polynomial and let g^x), . . . , g p (x) be the corresponding 
coefficient polynomials, 

d(x) = g 1 (x)h 1 (x) + • • • + g p (x)h p (x). (6.15) 



6 | Spectral Decomposition of Linear Transformations 273 

We assert that d(x) divides all h t {x). For example, let us try to divide h x {x) 
by d(x). Either d(x) divides h x {x) exactly or there is a remainder r x (x) of 
degree less than the degree of d(x). Thus, 

h x {x) = d(x) qi (x) + ri (x) (6.16) 

where q x {x) is the quotient. Suppose, for the moment, that the remainder 
r x {x) is not zero. Then 

r x {x) = h x (x) - d(x)q 1 (x) 

= Ai(a;){l - giO)?iO)} - g2(x)qi(x)h 2 (x) g p {x) qi {x)h v {x). 

(6.17) 

But this contradicts the selection of d(x) as a non-zero polynomial of smallest 
degree which can be written in this form. Thus d(x) must divide h^x). 
Similarly, d(x) divides each h^x). 

Since the factorization of m(x) is unique, the h t {x) have no common non- 
constant factor and d{x) must be a constant. Since we can divide any of 
these expressions by a non-zero constant without altering its form, we can 
take d{x) to be 1. Thus, we have an expression in the form of (6.14). 

If we divide (6.14) by m(x) we obtain 

1 gl(x) +•••+ 8 » (X) . (6.18) 



m{x) {x - A x ) Sl (x - XJ* 

This is the familiar partial fractions decomposition and the polynomials 
gi(x) can be found by any of several effective techniques. 

Now setting e t {x) = g^h^x), we see that we have obtained a set of 
polynomials {e x {x), . . . , e v {x)} such that 

(1) 1 = e x {x) + ••• + e v (x), 

(2) e i (x)e j (x) is divisible by m(x) for i ^ j. 

(3) (x — l^) Si e^x) is divisible by m(x). 

Now, we use these polynomials to form polynomial expressions of the 
linear transformation a. Then {e^a), . . . , e p (a)} is a set of linear trans- 
formations with the properties that 

(1) 1 = ei (a) + ■ • • + e p (a), 

(2) e i (a)e j (o) = 0fori^j, 

(3) (a - We^o) = 0. (6.19) 

From (1) and (2) it follows, also, that 

(4) e t (&) = 1 • e t {a) = (e^a) + • • • + e v (o)) ei (a) 

= e^e^a) H h e v (a)e t (a) 



274 Selected Applications of Linear Algebra | VI 

These four properties suffice to show that the e^a) are mutually orthogonal 
projections. From (3) we see that e<(o)(V) is in the kernel of (<r — A f ) s \ As in 
Chapter III-8, we denote the kernel of (a - A^) 8 * by M t . Since e t (x) is 
divisible by (x - A,-) 8 ' fory 5* i, e^a)^ = 0. Hence, if ft e M„ we have 

Pi = (*i(*) + • • • + *,(<r))(ft) 

= *<(*)(&)• (6.20) 

This shows that e t (&) acts like the identity on M t . Then M t = e^aXM^ c 
ei(a)(V) <= M. so that e,(<r)(V) = M 4 . By (6.19), for any £ e V we have 

= (e,(cr) + • • • + e»)(ft 

= h + • • • + j8 p , (6.21) 

where ft = ^(cr)(ft e M,-. If £ = 0, then ft = ^(cr)(ft = ^(ct)(0) = 
so that the representation of in the form of a sum like (6.21) with ft e M { is 
unique. Thus, 

V = M . e • • • © M p . (6.22) 

This provides an independent proof of Theorem 8.3 of Chapter III. 
Let (?j = a - e t (a). Then 

(1) a = a x + • • • + o p , 

(2) tfiOj = for i 5^ j, and 

(3)f(a)=f(a 1 ) + ---+f(a p ), (6.23) 

for any polynomial/^) with coefficients in F. If^ = 1, then (c; — h^e^a) = 
so that a t = ^e^a). In this case M i is the eigenspace corresponding to A*. 
If the multiplicity of X t is 1, then dim M t = 1 and e^a) = ir i is the projection 
onto M^. If A represents a, then P, = e t (A) represents tt> If the multiplicity 
of Aj is greater than 1, then e t (a) is a reducible projection. e t -(cr) can be 
reduced to a sum of irreducible projections in many ways and there is no 
reasonable way to select a unique reduction. 

If Si > 1 , the situation is somewhat more complicated. Let r i = (a — 
X^e^a). Then v s J = (a — X^e^a) = 0. A linear transformation for which 
some power vanishes is said to be nilpotent. Thus, 

a { = A^(<7) + Vi (6.24) 

is the sum of a scalar times an indempotent transformation and a nilpotent 
transformation. 

Since a commutes with e t (a), a(Mi) = <r^(<7)(V) = e^aiV) c M*. Thus 
M i is invariant under a. It is also true that a^M^ <= M t and ff^M,) = {0} 
for j ^ i. Each M, is associated with one of the eigenvalues. It is often 
possible to reduce each M t to a direct sum of subspaces such that a is invariant 
on each summand. In a manner analogous to what has been done so far, 



6 | Spectral Decomposition of Linear Transformations 275 

we can find for each summand a linear transformation which has the same 
effect as a on that summand and vanishes on all other summands. Then this 
linear transformation can also be expressed as the sum of a scalar times an 
idempotent transformation and a nilpotent transformation. The deter- 
mination of the Jordan normal form involves this kind of reduction in some 
form or other. The matrix B t of Theorem 8.5 in Chapter III can be written 
in the form A*E, + N> where E t is an idempotent matrix and N t is a nilpotent 
matrix. However, in the following discussion (6.23) and (6.24) are sufficient 
for our purposes and we shall not concern ourselves with the details of a 
further reduction at this point. 

There is a particularly interesting and important application of the spectral 
decomposition and the decomposition (6.23) to systems of linear differential 
equations. In order to prepare for this application we have to discuss the 
meaning of infinite series and sequences of matrices and apply these decom- 
positions to the simplification of these series. 

If {A k }, A k = [a w (fc)], is a sequence of real matrices, we say 

lim^ fc = A = [a tj ] 



k- 



if and only if lim a^k) = a {j for each / and/ Similarly, the series ]£* =0 c k A k 

is said to converge if and only if the sequence of partial sums converges. 
It is not difficult to show that the series 

y-A k (6.25) 

*-ofc! 

converges for each A. In analogy with the series representation of e x , the 
series (6.25) is taken to be the definition of e A . 

If A and B commute, then (A + B) k can be expanded by the binomial 
theorem. It can then be shown that 

e ^+B _ g A gB 

This "law of exponents" is not generally satisfied if A and B do not commute. 
If E is an idempotent matrix — that is, E 2 = E — then 

k\ 
= I - E + E(l + X + • • • + ^ + - - \ 

= i + («,* _ i)E. (6.26) 

If TV is a nilpotent matrix, then the series representation of e^ terminates 
after a finite number of terms, that is, e" is a polynomial in N. These 
observations, together with the spectral decomposition and the decomposition 



276 



Selected Applications of Linear Algebra | VI 



(6.23) in terms of commuting transformations (and hence matrices) of these 
types, will enable us to express e A in terms of finite sums and products of 
matrices. 

Let a = a x + • • • + a v as in (6.23). Let A represent a and A t represent 
Gi. Let E t represent e^a) and N t represent v t . Since each of these linear 
transformations is a polynomial in a, they (and the matrices that represent 
them) commute. Assume at first that a has a spectral decomposition, that 
is, each v t = 0. Then A t = A^ and 



gA _ g^iH VA V 



— e Ai e A 2 . . . e 






= [I + (e^ - 1)£J[/ + (**■ - l)E 2 ] •••[/ + (e x » - 1)E P ] 

= / + £(«*'- 1)E, (6.27) 

because of the orthogonality of the E t . Then 

e A = 2 e A % (6.28) 

because ^? =1 Et = I. This is a generalization of formula (6.10) to a function 
of A which is not a polynomial. 

The situation when a does not have a spectral decomposition is slightly 
more complicated. Then A { = A^ + N t and 



Ni 



e A = fle XEi fle 






As an example of formula (6.28) consider A = 



2 -1 



(6.29) 



The character- 



istic polynomial is x* — x — 6 = (x — 3) (x + 2). To fix our notation we 
take A x = — 2 and A 2 = 3. Since 



1 



+ 



(# — 3)(# + 2) x + 2 x — 3 

- (a; - 3) (a; + 2) 
we see that e x (x) = and e 2 (a;) = • Thus, 



F — — i 

-^1 — 5 



£* = 



-1 2" 
2 -4_ 

/ = J&! + £ 2 , 

A = —2E X + 3is 2 , 
e A = e ~*E x + e 3 £ 2 . 



'4 2" 
2 1 



6 | Spectral Decomposition of Linear Transformations 



277 



BIBLIOGRAPHICAL NOTES 

In finite dimensional vector spaces many things that the spectrum of a transformation 
can be used for can be handled in other ways. In infinite dimensional vector spaces matrices 
are not available and the spectrum plays a more central role. The treatment in P. Halmos, 
Introduction to Hilbert Space, is recommended because of the clarity with which the 
spectrum is developed. 



EXERCISES 



1 . Use the method of spectral decomposition to resolve each matrix A into the 
form A = AE X + AE 2 = X X E X + X 2 E 2 , where X x and X 2 are the eigenvalues of A 
and E x and E 2 represent the projections of V onto the invariant subspaces of A. 



(a) 


"1 2" 




(b) 


"1 


2" 




(c) 


"1 2" 




2 1_ 


» 




.2 -2_ 


» 




_2 4_ 


id) 


"1 3" 


to 


"1 4" 






3 - 


7_ 


5 




4 - 


5_ 







2. Use the method of spectral decomposition to resolve the matrix 

"3 -7 -20" 



A = 



■14 



-5 
3 

into the form A = AE X + AE 2 + AE 3 = X X E X + h 2 E 2 + ^E 3 , where E lf E 2 , and 
E 3 are orthogonal idempotent matrices. 

3. For each matrix A in the above exercises compute the matrix form of e A . 

4. Use the method of spectral decomposition to resolve the matrix 

1-3 3" 
3-2 
-1 -1 1 



A = 



into the form A = AE X + AE 2 where E x and E 2 are orthogonal idempotent 
matrices. Furthermore, show that AE ± (or AE 2 ) is of the form AE X = X-JE-^ + A^ 
where X x is an eigenvalue of A and N x is nilpotent, and that AE 2 (or AE X ) is of the 
form AE 2 = X 2 E 2 where A 2 is an eigenvalue of A. 

5. Let A = XE + N, where E 2 = E, EN = N, and JV is nilpotent of order r; 
that is, N 1 - 1 7* but N r = 0. Show that 

r-l fts 

e A = I + E(e x - 1) + e* T — 
«ti si 

= (/ - E){\ - e x ) + e k e N =1 -E + e x e N E. 



278 Selected Applications of Linear Algebra | VI 

6. Compute the matrix form of e A for the matrix A given in Exercise 4. 

7. Let A = AE X + AE 2 + ■ • ■ + AE P , where the E t are orthogonal idempotents 
and each AE t = Xfii + N t where E t Ni — N t . Show that 

e A = 2 e^e^Ei. 



% 7 I Systems of Linear Differential Equations 

Most of this section can be read with background material from Chapter I, 
the first four sections of Chapter II, and the first five sections of Chapter III. 
However, the examples and exercises require the preceding section, Section 5. 
Theorem 7.3 requires Section 1 1 of Chapter V. Some knowledge of the theory 
of linear differential equations is also needed. 

We consider the system of linear differential equations 

dx n 
x i = — 1 = JfliXO^. i = 1, 2, . . . , n (7.1) 

at t=i 

where the a tj (t) are continuous real valued functions for values of t, t x < t < 
t 2 . A solution consists of an rc-tuple of functions X(t) = (x 1 (0, . . . , x n (t)) 
satisfying (7.1). It is known from the theory of differential equations that 
corresponding to any n-tuple X = (x 10 , x 20 , . . . , x n0 ) of scalars there is a 
unique solution X(t) such that X(t ) — X . The system (7.1) can be written 
in the more compact form 

X = AX, (7.2) 

where ^(0 = (x^t), ... , x n (t)) and A = [a ti (t)]. 

The equations in (7.1) are of first order, but equations of higher order 
can be included in this framework. For example, consider the single «th 
order equation 

d n x d n ~ x x dx , „ 

This equation is equivalent to the system 

X-y = X 2 

JCn — - »£q 



X n-1 — X n 

*n = -a n -i x n a i x 2 - a x 1 . (7.4) 

The formula x k = expresses the equivalence between (7.3) and (7.4). 



7 | Systems of Linear Differential Equations 279 

The set of all real-valued differentiable functions denned on the interval 
[t lt t 2 ] forms a vector space over R, the field of real numbers. The set V 
of n-tuples X(t) where the x k {t) are differentiable also forms a vector space 
over R. But V is not of dimension n over R; it is infinite dimensional. It is 
not profitable for us to consider V as a vector space over the set of differenti- 
able functions. For one thing, the set of differentiable functions does not 
form a field because the reciprocal of a differentiable function is not always 
differentiable (where that function vanishes). This is a minor objection because 
appropriate adjustments in the theory could be made. However, we want the 

^i j- • dax 

operation of differentiation to be a linear operator. The condition — = 

chc 
a — requires that a must be a scalar. Thus we consider V to be a vector space 

over R. 

Statements about the linear independence of a set of «-tuples must be 
formulated quite carefully. For example. 

^(0 = 0,0), *.(0 = (*,0) 

are linearly independent over R even though the determinant 

1 







= 



for each value of t. This is because X x {t ) and X 2 (t ) are linearly dependent 
for each value of t , but the relation between X^) and X 2 (t ) depends on 
the value of t . In particular, a matrix of functions may have n independent 
columns and not be invertible and the determinant of the matrix might be 0. 
However, when the set of n-tuples is a set of solutions of a system of differ- 
ential equations, the connection between their independence for particular 
values of t and for all values of t is closer. 

Theorem 7.1. Let t be any real number, t x < t < t 2 , and let {X^t), . . . , 
X m (t)} be a finite set of solutions o/(7.1). A necessary and sufficient condition 
for the linear independence of {X-^t), . . . , X m (t)} is the linear independence 
ofiX^t,), . . . , X m (t )}. 

proof. If the set {X^t), . . . , X m (t)} is linearly dependent, then certainly 
(XiOo), . . . , X m (t )} is linearly dependent. On the other hand, suppose 
there exists a non-trivial relation of the form 

m 

Zc k X k (t ) = 0. (7.5) 

Consider the function X(t) = £™ =1 c k X k (t). X(t) is a solution of (7.1). 
But the function Y(t) = is also a solution and satisfies the condition 
Y(t Q ) = = X(t ). Since the solution is unique, we must have X(t) = 0. 



280 Selected Applications of Linear Algebra | VI 

It follows immediately that the system (7.1) can have no more than n 
linearly independent solutions. On the other hand, for each k there exists 
a solution satisfying the condition X k (t ) = (d lk , . . . , d nk ). It then follows 
that the set {X x {t), . . . , XJfj) is linearly independent. □ 

It is still more convenient for the theory we wish to develop to consider 
the differential matrix equation 

F= AF (7.6) 

where F(t) = [/^(Ol is an n X w matrix of differentiable functions and 
F = [fljit)]. If an F is obtained satisfying (7.6), then each column of F 
is a solution of (7.2). Conversely, if solutions for (7.2) are known, a solution 
for (7.6) can be obtained by using the solutions of (7.2) to form the columns 
of F. Since there are n linearly independent solutions of (7.2) there is a 
solution of (7.6) in which the columns are linearly independent. Such a 
solution to (7.6) is called a. fundamental solution. We shall try to find the 
fundamental solutions of (7.6). 

The differential and integral calculus of matrices is analogous to the cal- 
culus of real functions, but there are a few important differences. We have 
already defined the derivative in the statement of formula (7.6). It is easily 
seen that 

l (F + G) = ^ + ^, (7.7) 

at at at 

and 

1 (FG) = — G + F — . (7.8) 

dt dt dt 

Since the multiplication of matrices is not generally commutative, the 
order of the factors in (7.8) is important. For example, if F has an inverse 
F- 1 , then 

= £ (f- x F) = ^^ F + F' 1 — . (7.9) 

dt dt dt 

Hence, 

^ F-1 > = -p^ — F- 1 . (7.10) 

dt dt 

This formula is analogous to the formula (dx^/dt) = —x~\dx\dt) in 
elementary calculus, but it cannot be written in this simpler form unless F 
and dFjdt commute. 

Similar restrictions hold for other differentiation formulas. An important 
example is the derivative of a power of a matrix. 

dFl^dl F + F dF (7n) 

dt dt dt 



7 | Systems of Linear Differential Equations 281 

Again, this formula cannot be further simplified unless Fand dF\dt commute. 
However, if they do commute we can also show that 

*£l = kF ^-i4L m (7.12) 

dt dt 

Theorem 7.2. If F is a fundamental solution of (7.6), then every solution 
G of (7.6) is of the form G = FC where C is a matrix of scalar s. 

proof. It is easy to see that if F is a solution of (7.6) and C is an n x n 
matrix of scalars, then FC is also a solution. Notice that CF is not necessarily 
a solution. There is nothing particularly deep about this observation, but it 
does show, again, that due care must be used. 

Conversely, let F be a fundamental solution of (7.6) and let G be any 
other solution. By Theorem 7.1, for any scalar t F(t ) is a non-singular 
matrix. Let C = Fi^G^). Then H = FC is a solution of (7.6). But 
H(t Q ) = F(t )C = G(t ). Since the solution satisfying this condition is 
unique we have G = H = FC. O 



Let 



Jto 



B(t) = A(s) ds (7.13) 



where B(t) = [& w (f)l and b u {t) — JJ a tj (s) ds. We assume that A and B 

dB k 
commute so that — — = kAB*- 1 . Consider 
dt 

e B =lf~. (7-14) 

7c=o k ! 

Since this series converges uniformly in any finite interval in which all the 
elements of B are continuous, the series can be differentiated term by term. 
Then 

df_ = | /cAB^ 1 = ^| & = AeB . (7 15) 

dt k=o k\ k=o k ! 

that is, e B is a solution of (6.6). Since e B{to) = e° = I, e B is a fundamental 
solution. The general solution is 

F(t) = e mt) C. (7.16) 

In this case, under the assumption that A and B commute, e B is also a 
solution of F = FA. 
As an example of the application of these ideas consider the matrix 

1 It 



A = 



It 1 



282 

and take t = 0. Then 



Selected Applications of Linear Algebra | VI 



B = 



■f t 2 ' 
t 2 t 



It is easily verified that A and B commute. The characteristic polynomial 
of B is (x - t - t 2 )(x - t + t 2 ). Taking X x = t + t 2 and X 2 = t - t 2 , we 

have e x (x) = — (x — t + t 2 ) and e 2 (x) = — — (x — t — t 2 ). Thus, 



1 
I = 2 



"i r 


i 




+ - 


i i_ 


2 



1 
-1 



is the corresponding resolution of the identity. By (6.28) we have 



e t+t* 

2 


"1 f 
.1 1. 


t-t 2 


1 - 
.-1 


r 


1 


V+' 2 + e*-* 2 e t+t * - e*-* 2 ' 




2 


J 


^-, 


t-t 2 e t+i 


2 +^ 2 J 





Since e B is a fundamental solution, the two columns of e B are linearly 
independent solutions of (7.2), for this A. 

If A is a matrix of scalars, B = (t — t )A certainly commutes with A 
and the above discussion applies. However, in this case the solution can be 
obtained more directly without determining e B . If C = (c l5 . . . , c„), 
Ci e R, represents an eigenvector of A corresponding to I, that is, AC = AC, 
then BC = (t — t )AC so that {t — t Q )X is an eigenvalue of B and C represents 
an eigenvector of B. Then 



X(t) = (c^™-^, c 2 e Mt - to \ 



> £«£ 



Mt-t ) 



(7.17) 



is a solution of (7.2), as can easily be verified. Notice that X(t ) = C. 

Conversely, suppose that X{t) is a solution of (7.2) such that X(t ) = C 
is an eigenvector of A. Since X(t) is also a solution of (7.2), 

Y{t)=X-XX (7.18) 

is also a solution of (7.2). But Y = X - XX = AX - XX = (A - XI)X for 
all t, and F(f ) = (A — XI)C = 0. Since the solution satisfying this condition 
is unique, Y(t) = 0, that is, X = XX. This means each x,(f) must satisfy 

x t {t) = Xx,{t), Xj (t ) = c,. (7.19) 

Thus X(t) is given by (7.17). This method has the advantage of providing 
solutions when it is difficult or not necessary to find all eigenvalues of A. 



7 | Systems of Linear Differential Equations 283 

Return now to the case where A is not necessarily a matrix of constants. 
Let G be a fundamental solution of 

G = -A T G. (7.20) 

This system is called the adjoint system to (7.6). It follows that 

WL^D. = q t F + G T F = {-A T G) T F + G T AF (7.21) 

dt 

= G T (-A)F + G T AF = 0. 

Hence, G^F = C, a matrix of scalars. Since F and G are fundamental 
solutions, C is non-singular. 
An important special case occurs when 

A T = -A. (7.22) 

In this case (7.6) and (7.20) are the same equation and the equation is said 
to be self-adjoint. Then F is also a solution of the adjoint system and we see 
that F T F = C. In this case C is positive definite and we can find a non- 
singular matrix D such that D T CD = /. Then 

D T F T FD = I = (FD) T (FD). (7.23) 

Let FD = N. Here we have N T N = / so that the columns of N are ortho- 
normal solutions of (7.2). 

Conversely, let the columns of F be an orthonormal set of solutions of 
(7.2). Then 

= d(F^F) = p Tp + pTf = {AF) T F + f t af = F T( A T + A)F (7 24) 
dt 

Since F and F T have inverses, this means A T + A = 0, so that (7.6) is 
self-adjoint. Thus, 

Theorem 7.3. The system (7.2) has an orthonormal set of n solutions if 
and only if A T = —A. □ 



BIBLIOGRAPHICAL NOTES 

E. A. Coddington and N. Levinson, Theory of Ordinary Differential Equations; F. R. 
Gantmacher, Applications of the Theory of Matrices; W. Hurewicz, Lectures on Ordinary 
Differential Equations; and S. Lefschetz, Differential Equations, Geometric Theory, contain 
extensive material on the use of matrices in the theory of differential equations. They differ 
in their emphasis, and all should be consulted. 



284 Selected Applications of Linear Algebra | VI 

EXERCISES 



1. Consider the matric differential equation 

"1 2 
X = 



2 1 



X = AX. 



Show that (-1,1) and (1, 1) are eigenvectors of A. Show that X x = (—<?-<*-*<>>, 
^-(t-*o>) and X 2 = (e 3(t_ 'o> 5 c 3(t— « )) are solutions of the differential equation. 
Show that 

" _ e -U-t ) g3(t-*o>~ 

F = 

e -«-< ) e 3(t-t ) 

is a fundamental solution of F = AF. 

Go through a similar analysis for the equation X = AX for each matrix A given 
in Exercises 1 and 2 of Section 6. 

2. Let /I = IE + N where iV is idempotent and EN = NE = N. Show that 

e At = I - E + e u e Nt E. 

3. Let Y = e Nt where TV is nilpotent. Show that 

Y = Ne Nt . 

(This is actually true for any scalar matrix N. However, the fact that N is nilpotent 
avoids dealing with the question of convergence of the series representing e N and 
the question of differentiating a uniformly convergent series term by term.) 

4. Let A = AE X + ■ ■ ■ + AE P where the E { are orthogonal idempotents and 
each AE t is of the form AE t = \E t + N f where ^iV* = N& = N t and N t is 
nilpotent. Show that v 

B =e At =£ e Xit e N ^Ei. 
Show that i=1 

B = AB. 

8 I Small Oscillations of Mechanical Systems 

This section requires a knowledge of real quadratic forms as developed 
in Section 10 of Chapter IV, and orthogonal diagonalization of symmetric 
matrices as achieved in Section 1 1 of Chapter V. If the reader is willing to 
accept the assertions given without proof, the background in mechanics 
required is minimal. 

Consider a mechanical system consisting of a finite number of particles, 
{/*!, . . . , P r }. We have in mind describing the motions of such a system 
for small displacements from an equilibrium position. The position of each 
particle is specified by its three coordinates. Let (x 3i _ 2 , x 3i _ t , x 3i ) be the 
coordinates of the particle iV Then (x lt x tt . . . , x 3r ) can be considered to 



8 | Small Oscillations of Mechanical Systems 285 

be the coordinates of the entire system since these 3r coordinates specify 
the location of each particle. Thus we can represent the configuration of a 
system of particles by a point in a space of dimension n = 3r. A space of 
this kind is called a phase space. 

The phase space has not been given enough structure for us to consider 
it to be a vector space with any profit. It is not the coordinates but the 
small displacements we have to concentrate our attention upon. This is a 
typical situation in the applications of calculus to applied problems. For 
example, even though y =f(x lt ...,*„) may be a transcendental function 
of its variables, the relation between the differentials is linear, 

d 9 -&-d* l + p!-dx t + --+2!-dx.. (8.1) 

dx x ox 2 ox n 

In order to avoid a cumbersome change of notation we consider the co- 
ordinates (x lt . . . , x n ) to be the coordinates of a displacement from an 
equilibrium position rather than the coordinates of a position. Thus the 
equilibrium point is represented by the origin of the coordinate system. 
We identify these displacements with the points (or vectors) in an «-dimen- 
sional real coordinate space. 

The potential energy V of the system is a function of the displacements, 

V = V(x x , . . . , x n ). The Taylor series expansion of V is of the form 

IdV dV 

V = V + — x x + •••+—- x r 

ox„ 



1/dV , . d 2 v 2 , . d 2 v . 2 , „ d 2 v 

2 X n 

i 

d 2 V 



+ - I X X x + : X 2 + + - - X n -h L X X X 2 

2\dx 2 dx 2 dx 2 dx x dx 2 



+ . . . + 2 -— x n _ x x n \ + • ■ ■ 



(8.2) 



We can choose the level of potential energy so that it is zero in the equi- 
librium position; that is, V = 0. The condition that the origin be an 

dV dV 

equilibrium position means that — = • • • = - — = 0. If we let 



dx x dx, 

d 2 V d 2 V 



= a i4 = 



dx, dx, dx d dx. 



= a M , (8.3) 



then 



V = \ 2 WW +••■■ (8.4) 



286 Selected Applications of Linear Algebra | VI 

If the displacements are small, the terms of degree three or more are small 
compared with the quadratic terms. Thus, 

V = ; I *i*n*i (8-5) 

is a good approximation of the potential energy. 

We limit our discussion to conservative systems for which the equilibrium 
is stable or indifferent. If the equilibrium is stable, any displacement must 
result in an increase in the potential energy; that is, the quadratic form in 
(8.5) must be positive definite. If the equilibrium is indifferent, a small 
displacement will not decrease the potential energy; that is, the quadratic 
form must be non-negative semi-definite. 

The kinetic energy Tis also a quadratic form in the velocities, 

T = \ I iib u x t . (8.6) 

In this case the quadratic form is positive definite since the kinetic energy 
cannot be zero unless all velocities are zero. 
In matrix form we have 

V = \X T AX, (8.7) 

and 

T = hX T BX, (8.8) 

where A = [a it ], B = [b it \ t X = (x x , . . . , x n ), and X = (x lt ... , x n ). 

Since B is positive definite, there is a non-singular real matrix Q such 
that Q T BQ = I. Since Q T AQ = A' is symmetric, there is an orthogonal 
matrix Q' such that Q' T A'Q' = A" is a diagonal matrix. Let P = QQ'. 
Then 

P T AP = Q' T Q T AQQ' = Q' T A'Q' = A" (8.9) 

and 

P T BP = Q' T Q T BQQ' = Q' T IQ' = Q' T Q' = I. (8.10) 

Thus P diagonalizes A and B simultaneously. (This is an answer to Exercise 3 , 
Chapter V- 11.) 
If we set Y = P' 1 X, then (8.7) and (8.8) become 



and 



V = lY T A"Y = \j l a i y i t , (8.11) 

2i=l 



T = \Y T Y=\j,y?, (8.12) 

2i=l 



where a t is the element in the ith place of the diagonal of A". 



8 | Small Oscillations of Mechanical Systems 287 

In mechanics it is shown that the Lagrangian L= T — V satisfies the 
differential equation 

d_/dL\ _dL = 0i j = 1, . . . , n . (8.13) 

dt\dyj dyi 

For a reference, see Goldstein, Classical Mechanics, p. 18. Applied to 
(8.11) and (8.12), this becomes 

ft + ««y< = 0, i = 1,...,/i. (8.14) 

If A is positive definite, we have a t > 0. If A is non-negative semi-definite, 
we have a t > 0. For those a t > 0, let a f = a>< 2 where eo* > 0. The solutions 
of (8.14) are then of the form 

y .(t) = Cj cos {(o f t + e t ), j=l, ... ,n. (8.15) 

p = [p if ] is the matrix of transition from the original coordinate system 
with basis A = {a l5 . . . , aj to a new coordinate system with basis B = 
{fi lt ..., p n }; that is, & = 27=i/>» a r Thus ' 

*i(0 = 2p< t 9#) = IPijCf cos ((Oft + 0,) (8.16) 

0=1 3=1 

in the original coordinate system. If the system is displaced to an initial 
position in which 

y k (0) =l,t/,(0) = for j * k, (8.17) 

&(0) = 0, j=l,...,n, 



then 
and 



or 



y k (t) = cos (o k t, yf(t) = for j 5* k, (8.18) 

x A t ) = Pik cos °V> ( 8 - 19 ) 

*(0 = (Pi^P2 k , • • • > />»*) cos w *'- ( 8 - 2 °) 

|8 fc is represented by (p lk ,p 2k , ... , p nk ) in the original coordinate system. 
This w-tuple represents a configuration from which the system will vibrate 
in simple harmonic motion with angular frequency (o k if released from rest. 
The vectors {&, . . . , £„} are called the principal axes of the system. They 
represent abstract "directions" in which the system will vibrate in simple 
harmonic motion. In general, the motion in other directions will not be 
simple harmonic, or even harmonic since it is not necessary for the w, to be 
commensurable. The coordinates (y lf . . . , y n ) in this coordinate system 
are called the normal coordinates. 

We have described how the simultaneous diagonalization of A and B 



288 Selected Applications of Linear Algebra | VI 

can be carried out in two steps. It is often more convenient to achieve the 
diagonalization in one step. Consider the matric equation 

(A - XB)X = 0. (8.21) 

This is an eigenvalue problem in which we are asked to find a scalar X for 
which the equation has a non-zero solution. This means we must find a 
A for which 

det(A - XB) = 0. (8.22) 

Using the matrix of transition P given above we have 

= det P T ■ det(A - XB) • det P = det(P T (A - XB)P) 
= det(P T AP - XP T BP) = det{A" - XI). (8.23) 

Since A" is positive definite or non-negative semi-definite the eigenvalues 
are >0. In fact, these eigenvalues are the a t of formula (8.11). 

Let X x and X 2 be eigenvalues of equation (8.21) and let X x and X 2 be 
corresponding solutions. If X x ^ X 2 , then 

Aj AX 2 = Aj {X 2 BX 2 ) = X 2 X X BX 2 
= (A T X 1 ) T X 2 = (AX 1 ) T X 2 

= (X l BX 1 ) T X 2 = X X X X T BX 2 . (8.24) 

Thus, 

X t T BX 2 = 0. (8.25) 

This situation is described by saying that X x and X 2 are orthogonal with 
respect to B. This is the same meaning given to this term in Chapter V-l 
where an arbitrary positive definite quadratic form was selected to determine 
the inner product. 

This argument shows that if the eigenvalues of (8.21) are distinct, then 
there exists an orthonormal basis with respect to the inner product defined 
by B. We must show that such a basis exists even if there are repeated 
eigenvalues. Let a be the linear transformation represented by A" with 
respect to the basis 8, that is, cr(^) = a^. Let X t be the representation of 
fa with respect to the basis A, Xj = (p Xj , . . . ,p ni ). The matrix representing 
a with respect to A is PA"P~ X . Thus, 

PA"P~ 1 X j = djX,. (8.26) 

Then 

AX, = (P T )- 1 P T APP~ 1 X j 

= (P T )- 1 A"P~ 1 X j 

= {P T )- 1 P- X PA"P- X X J 

= (P^P-^X, 

= a t BX,. (8.27) 



8 I Small Oscillations of Mechanical Systems 289 



Fig. 5 




mi 



Since (A — a^Xj = 0, we see that a t is an eigenvalue of (8.21) and X t 
is a corresponding eigenvector. Since the columns of P are the X iy the 
condition (8.10) is equivalent to the statement that the X, are orthonormal 
with respect to the inner product defined by B. 

We now have two related methods for finding the principal axes of the 
given mechanical system: diagonalize B and A in two steps, or solve 
the eigenvalue problem (8.21). Sometimes it is not necessary to find all the 
principal axes, in which case the second method is to be preferred. Both 
methods involve solving an eigenvalue problem. This can be a very difficult 
task if the system involves more than a few particles. If the system is highly 
symmetric, there are other methods for finding the principal axes. These 
methods are discussed in the next sections. 

We shall illustrate this discussion with a simple example. Consider the 
double pendulum in Fig. 5. Although there are two particles in the system, 
the fact that the pendulum rods are rigid and the system is confined to a 
plane means that the phase space is only 2-dimensional. This example also 
illustrates a more general situation in which the phase space coordinates are 
not rectangular. 

The potential energy of the system is 

V = gnhil - I cos xj + gm 2 (2l - I cos x x — I cos x 2 ) 
= gl[(mi + 2w 2 ) — K + m 2 ) cos x x — m 2 cos x 2 ]. 
The quadratic term is 

v = \ K m i + m 2>i 2 + rn 2 x 2 2 }. 



290 



Selected Applications of Linear Algebra | VI 




r*i 



-2x1 



Fig. 6 



The kinetic energy is 

T = ^ntxilxj) 2 + \m 2 l 2 [x x 2 + x 2 2 + 2 cos (x x + x 2 )x x x 2 ]. 

The quadratic term is 

T = £/ 8 [(/»i + Wa)^ 2 + 2m 2 x x x 2 + m 2 i 2 2 ]. 

To simplify the following computation we take m 1 = 3 and m 2 = 1 . Then 
we must simultaneously diagonalize 



A=gl 



Solving the equation (8.22), we find X x = (2g/3l), A 2 = 2g/l. This gives 
o>i = y/2g/3l, co 2 = \2g/l. The coordinates of the normalized eigenvectors 

are X x = — -= (1, 2), X 2 = — (1, —2). The geometrical configuration for 

the principal axes are shown in Fig. 6. 

The idea behind the concept of principal axes is that if the system is started 
from rest in one of these two positions, it will oscillate with the angles x x 
and x 2 remaining proportional. Both particles will pass through the vertical 
line through the pivot point at the same time. The frequencies of these two 



4 0" 


B = I 2 


-4 1 


1 


i 


1 1 



8 | Small Oscillations of Mechanical Systems 291 

modes of oscillation are incommensurable, their ratio being %/3. If the 
system is started from some other initial configuration, both modes of 
oscillation will be superimposed and no position will ever be repeated. 



BIBLIOGRAPHICAL NOTES 

A simple and elegant treatment of the physical principles underlying the theory of 
small oscillations is given in T. von Karman and M. A. Biot, Mathematical Methods in 
Engineering. 



EXERCISES 

Consider a conservative mechanical system in the form of an equilateral triangle 
with mass M at each vertex. Assume the sides of the triangle impose elastic forces 
according to Hooke's law ; that is, if j is the amount that a side is stretched from an 
equilibrium length, then ks is the restoring force, and ks 2 /2 = ft ku du is the 
potential energy stored in the system by stretching the side through the distance s. 
Assume the constant k is the same for all three sides of the triangle. Let the triangle 
be placed on the x, y coordinate plane so that one side is parallel to the a>axis. 
Introduce a local coordinate system at each vertex so that the that the coordinates 
of the displacements are as in Fig. 7. We assume that displacements perpendicular 
to the plane of the triangle change the lengths of the sides by negligible amounts 
so that it is not necessary to introduce a z-axis. All the following exercises refer 
to this system. 

1. Compute the potential energy as a function of the displacements. Write 
down the matrix representing the quadratic form of the potential energy in the 
given coordinate system. 

2. Write down the matrix representing the quadratic form of the kinetic energy 
of the system in the given coordinate system. 



yz 



*3 



A / \ k 


t 




yi 

— >- 



X\ X2 

Fig. 7 



292 Selected Applications of Linear Algebra | VI 

3. Using the coordinate system, 



z x =x 2 - x z y' 1 =y 2 - y z 
*2= X 3- X l 2/2=2/3— Vl 

x z = x * y' z = y%- 



The quadratic form for the potential energy is somewhat simpler. Determine the 
matrix representing the quadratic form for the potential energy in this coordinate 
system. 

4. Let the coordinates for the displacements in the original coordinate system 
be written in the order, (x lt y x , x 2 , y 2 , x 3 , y 3 ). Show that (1, 0, 1, 0, 1, 0) and 
(0, 1, 0, 1, 0, 1) are eigenvectors of the matrix V representing the potential energy. 
Give a physical interpretation of this observation. 

9 I Representations of Finite Groups by Matrices 

For background the material in Chapters I, II, III, and V is required, 
except for Section 8 of Chapter III. A knowledge of elementary group 
theory is also needed. Appreciation of some of the results will be enhanced 
by a familiarity with Fourier transforms. 

The theory of representations of finite groups is an elegant theory with its 
own intrinsic interest. It is also a finite dimensional model of a number 
of theories which outwardly appear to be quite different; for example, 
Fourier series, Fourier transforms, topological groups, and abstract harmonic 
analysis. We introduce the subject here mainly because of its utility in finding 
the principal axes of symmetric mechanical systems. 

We have to assume a modest knowledge of group theory. In order to be 
specific about what is required we state the required definitions and theorems. 
We do not give proofs for the theorems. They are not difficult and can be 
considered to be exercises, or their proofs can be found in the first few pages 
of any standard treatment of group theory. 

Definition. A group G is a set of elements in which a law of combination is 
defined having the following properties : 

(1) If a, b e G, then ab is uniquely defined by the law of combination and 
is an element of G. 

(2) (ab)c = a(bc), for all a, b, c e G. 

(3) There exists an element e e G, called the unit element, such that 
ea = ac = a for each a e G. 

(4) For each a e G there exists an element a -1 , called the inverse of a, 
such that a _1 a = aa _1 = e. 

Although the law of combination is written multiplicatively, this does 
not mean that it has anything to do with multiplication. For example, the 



9 | Representations of Finite Groups by Matrices 293 

field of rational numbers is a group under addition, and the set of positive 
rational numbers is a group under multiplication. A vector space is a group 
under vector addition. If the condition 

(5) ab = ba 

is also satisfied, the group is said to be commutative or abelian. 

The number of elements in a group is called the order of the group. We 
restrict our attention to groups of finite order. 

A subset of a group which satisfies the group axioms with the same law of 
combination is called a subgroup. Let G be a group and S a subgroup of G. 
For each a £ G, the set of all b £ G such that a _1 b £ S is called a left coset of 
S defined by a. By aS we mean the set of all products of the form ac where 
c e S. Then a _1 b £ S is equivalent to the condition b £ aS; that is, aS is the 
left coset of S determined by a. Two left cosets are equal, aS = bS, if and 
only if S = a^bS or a _1 b £ S; that is, if and only if b £ aS. The number of 
elements in each coset of S is equal to the order of S. A right coset of S is of 
the form Sa. 

Theorem 9.1. If G is of finite order and S is a subgroup of G, the order 
of S divides the order of G. □ 

If S is a subgroup such that its right and left cosets are equal — that is, 
aS = Sa for each a £ G — then S is said to be an invariant or normal subgroup 
of G. If S is normal, then (aS)(bS) = a(Sb)S = a(bS)S = (ab)SS = abS so 
that the product of two cosets is a coset. 

Theorem 9.2. If S is a normal subgroup ofG, the cosets of S form a group 
under the law of combination (aS)(bS) = abS. □ 

If S is a normal subgroup of G, the group of cosets is called the factor 
group of G by S and is denoted by G/S. 

If G x and G 2 are groups, a mapping /of G x into G 2 is called a homomorphism 
if /(ab) =/(a)/(b) for all a, b £ G x . Notice that the law of combination 
on the left is that in G l5 while the law of combination on the right is that in G 2 . 
If the homomorphism is one-to-one, it is called an isomorphism. If e is the 
unit element in G 2 , the set of all a £ G x such that/(a) = e is called the kernel 
of the homomorphism. 

Theorem 9.3. (The homomorphism theorem). If G 2 is the image of G 2 
under the homomorphism f and K is the kernel off, then G 2 is isomorphic 
to GJK, where f (a) £ G 2 corresponds to aK £ GJK. □ 

An isomorphism of a group onto itself is called an automorphism. For 
each fixed a £ G, the mapping / a of G onto itself defined by / a (x) = a _1 xa 
is an automorphism. It is called an inner automorphism. 



294 Selected Applications of Linear Algebra | VI 

Two group elements b x and b 2 are said to be conjugate if there is an a e G 
such that a _1 b 1 a = b 2 . For each b e G, the set of conjugates of b is called 
the conjugate class determined by b. The conjugate class determined by b 
is the set of all images of b under all possible inner automorphisms. The 
normalizer N b of b is the set of all a such that the inner automorphism / a 
leaves b fixed. 

Theorem 9.4. The normalizer N b is a subgroup of G. The number of con- 
jugates ofb is equal to the number of left cosets ofN b . The number of elements 
in conjugate class is a divisor of the finite order of the group. □ 

We have already met several important groups. The set of all n x n 
matrices with non-zero determinant forms a group under matrix multipli- 
cation. This group is infinite. There are several important subgroups; 
the matrices with determinant ±1, the orthogonal matrices, etc. These 
are also infinite. There are also a large number of finite subgroups. Given 
a finite group, representation theory seeks to find groups of matrices which 
are models, in some sense, of the given group. 

Definition. Let G be a given finite group. A representation of G is a group 
D(G) of square matrices (under the operation of matrix multiplication) 
with a homomorphism mapping G onto D(G). Notice that the homo- 
morphism must be specified and is part of the representation. If the mapping 
is an isomorphism, the representation is said to be faithful. The matrices 
in D(G) represent linear transformations on a vector space V over C. The 
corresponding linear transformation also form a group homomorphic to G, 
and we also say that this group of transformations represents G. The 
dimension of V is called the dimension of the representation. 

If a e G, we denote by D(a) the matrix in D(G) corresponding to a under 
the homomorphism of G onto D(G). <x a will denote the corresponding linear 
linear transformation on V. 

Since <r a = a & a b -ia h and the rank of a product is less than or equal to 
the rank of any factor, we have piaj < p(o h ). Similarly, we must have 
p(°b) ^ P( a t)> an d hence the ranks of all matrices in D(G) must be equal. 
Let their common rank be r. Then a a (V) = S a is of dimension r, and 
a &2 (V) = o- a (a & (V)) = a & (S & ) is also of dimension r. Since o^SJ c: S a , 
we have cr a (S a ) = S a . Also S a = <r a (V) = a^a^a^V)) <= <r b (V) = S b . 
Similarly, S b <= S a so that S a = S b . This means that all linear transformations 
representing G are automorphisms of some subspaces S of dimension r in 
V, and cr a (V) = S for all a e G. Thus, we may as well restrict our attention 
to S, and in the future we assume, without loss of generality, that the matrices 
and transformations representing a group are non-singular, that is, S = V. 

A subspace U <= V such that a a (U) <= U for all a e G is called an invariant 



9 | Representations of Finite Groups by Matrices 295 

subspace of V under D(G). Since <r a is non-singular, actually <r & (U) = U. 
This means that U is also a representation space of G over C. If V has a 
proper invariant subspace we say that V and the representation are reducible. 
Otherwise the representation and the space are said to be irreducible. If 
there is another proper invariant subspace W such that V = U ® W, we say 
that V and the representation are completely reducible. Notice that if U x 
and U 2 are invariant under D(G), then U 1 n U 2 is also invariant under D{G). 

Theorem 9.5. For a finite group G a reducible representation is completely 
reducible. 

proof. Let U be an invariant subspace. Let T be any subspace com- 
plementary to U, and let -n be the projection of V onto U along T. Since 
<7 a (Lf) = U, itojt = (r a 7r. Thus, in a complicated product of tr's and 7r's, 
all 7r's to the left of the last one can be omitted; for example, TTa t jra h 7ra c = 
cr a (r a 7r<r c . Consider 

T = - 2 ffa-VTTffa, (9.1) 

g a 

where g is the order of G, and the sum is taken over all a e G. 

An expression like (9.1) is called a symmetrization of it. t has the important 
property that it commutes with each a a whereas -n might not. Specifically, 

1 v lv 

TOr b = - 2, Oa-iTrO-gCTb = - 2, ffb^b-^-^Cab 

g • g a 

g a 

The reasoning behind this conclusion is worth examining in detail. If 
G = {a l5 a 2 , . . . , a ff } is the list of the elements in G written out explicitly 
and b is any element in G, then {axb, a 2 b, . . . , a ff b} must also be a list (in a 
different order) of the elements in G because of axioms (1) and (4). Thus, as a 
runs through the group, ab also runs through the group and ^ a a (a.b)- l7Ta &b = 
2a a &- l7T(J &- Al so notice that this conclusion does not depend on the con- 
dition that 77 is a projection. The symmetrization of any linear transformation 
would commute with each <r a . 
Now, 

TTT = - ^ Oa-lTTffaTT = ~ 2 <J 2T ia ^' IT = ~ 2 "" = ""» ( 9 - 3 ) 

g * g a g a 

and 

77T = — 2 TTOa-iTTCa = — 2 a X~ 17T<J * = T - (9-4) 

g a g a 

Among other things, this shows that t and 7r have the same rank. Since 
T (V) = 7tt(V) c U, this means that t(V) = U. Then t 2 = t(ttt) = (ttt)t = 
7tt = t, so that t is a projection of V onto U. 



D(a) = 



(9.5) 



296 Selected Applications of Linear Algebra | VI 

Let W be the kernel of t. Then, for each a g G, ra & (W) = a a r(W) = 
°a(0) — 0. Thus o- a (W) c= W and W is an invariant subspace. Finally, 
it is easily seen that V = U © W. Thus, the representation is completely 
reducible. □ 

The importance of complete reducibility is that the representation on V 
induces two representations, one on U and one on W. If a basis {a l5 . . . , 
a r , /?!, . . . , fi n _ r } of V is chosen so that {a x , . . . , a r } is a basis of U and 
{&, . . . , /? n _ r } is a basis of W, then Z)(a) is of the form 

"^(a) ' 

2) 2 (a) 

where Z> x (a) is an r x r matrix and Z> 2 (a) is an n — r x n — r matrix. 
Z) 1 (a) represents a^rr on (-/, and Z> 2 (a) represents <r a (l — 77-) on W. The 
set X) 1 (G) = {D x (2l) | a g G} is a representation of G on L/, and D 2 (G) = 
(Z) 2 (a) I a e G} is a representation of G on W. We say that D(G) is the 
direct sum of Z^G) and D 2 (G) and we write D(G) = D X (G) + Z) 2 (G). 

If either the representation on U or the representation on W is reducible, 
we can decompose it into a direct sum of two others. We can proceed 
in this fashion step by step, and the process must ultimately terminate 
since at each step we obtain subspaces of smaller dimensions. When that 
point is reached we will have decomposed D(G) into a direct sum of irreduc- 
ible representations. If U is an invariant subspace obtained in one decom- 
position of V and U' is an invariant subspace obtained in another, then 
U n U' is also invariant. If U and W are both irreducible, then either 
U n U' = {0} or U = U'. Thus, the irreducible subspaces obtained are 
unique and independent of the particular order of steps in decomposing 
V. We see, then, that the irreducible representations will be the ultimate 
building blocks of all representations of finite groups. 

Although the irreducible invariant subspaces of V are unique, the matrices 
corresponding to the group elements are not unique. They depend on the 
choices of the bases in these subspaces. We say that two groups of matrices, 
D(G) = (Z)(a) I a £ G} and D'{G) = {Z)'(a) | a e G}. are equivalent repre- 
sentations of G if there is a non-singular matrix P such that D'(a) = P _1 Z)(a)P 
for every a g G. In particular, if the two groups of matrices represent the 
same group of linear transformations on V, then they are equivalent and P 
is the matrix of transition. But the definition of equivalence allows another 
interpretation. 

Let V and V be vector spaces over C both of dimension n. Let/ be a one- 
to-one linear transformation mapping V onto V, and let P be the matrix 
representing/. We can define a linear transformation r a on V by the rule 

r & =f-^f. (9.6) 



9 | Representations of Finite Groups by Matrices 297 

It is easy to show that the set {r a } defined in this way is a group and that it 
is isomorphic to {a & }. The groups of matrices representing these linear 
transformations are also equivalent. 

If a representation D(G) is given arbitrarily, it will not necessarily look 
like the representation in formula (9.5). However, if D(G) is equivalent 
to a representation that looks like (9.5), we also call that representation reduc- 
ible. The importance of this notion of equivalence is that there are only 
finitely many inequivalent irreducible representations for each finite group. 
Our task is to prove this fact, to describe at least one representation for each 
class of equivalent irreducible representations, and to find an effective 
procedure for decomposing any representation into its irreducible com- 
ponents. 

Theorem 9.6 {Schur's lemma). Let D\G) and D 2 (G) be two irreducible 
representations. If T is any matrix such that TD 1 ^) = D 2 (n)Tfor all a e G, 
then either T = or T is non-singular and D\G) and D 2 (G) are equivalent. 

proof. Let V x of dimension m be the representation space of D\G), 
and let V 2 of dimension n be the representation space of D 2 (G). Then T 
must have n rows and m columns, and so may be thought of as representing 
a linear transformation / of V x into V 2 . Let a x a be the linear transformation 
on V x represented by D^a) for a e G, and let <r 2 a be the linear transformation 
on V 2 represented by D 2 (a). Since fa la = o 2 Jf, ^J/TO] =/[<r 1> ,(V 1 )] = 
/(V x ) so that f(V x ) is an invariant subspace of V 2 . Since V 2 is irreducible, 
either/(Vi) = and hence T = 0, or else/(V 1 ) = V 2 . 

If a ef-i(0), then/[or la ( a )] = a 2 Jf(a.)] = (T 2 , a (0) = 0, so that (r 1>B (a) e 
/-HO). Thus /"HO) is an invariant subspace of V v Since V x is irreducible, 
either /-HO) = 0, or else/- x (0) = V x in which case T = 0. 

Thus, either /(V x ) = 0, /"HO) = V lt and T = 0; or else f(Vd = V 2 , 
/-i(0) = 0, and T is non-singular. In the latter case the representations are 
equivalent. □ 

Theorem 9.7. Let D{G) be an irreducible representation over the complex 
numbers. If T is any matrix such that TD(n) = D(a)T for all a 6 G, then 
T = XI where X is a complex number. 

proof. Let X be an eigenvalue of T. Then (T - XI)D{a) = D(n)(T -XI) 
and, by Theorem 9.6, T — XI is either or non-singular. Since T — XI 
is singular, T = XL o 

Theorem 9.8. If D(G) is an irreducible representation such that any two 
matrices in D(G) commute, then all the matrices in D(G) are of first order. 

proof. By Theorem 9.7, all matrices in D(G) must be of the form XL 
But a set of matrices of this form can be irreducible only if all the matrices 
are of first order. □ 



298 Selected Applications of Linear Algebra | VI 

With the proof of Schur's lemma and the theorems that follow im- 
mediately from it, the representation spaces have served their purpose. 
We have no further need for the linear transformations and we have to be 
specific about the elements in the representing matrices. Let Z>(a) = [tf„(a)J. 

Theorem 9.9. Every representation of a finite group over the field of 
complex numbers is equivalent to a representation in which all matrices are 
unitary. 

proof. Let D{G) be the representation of G. Consider 

H = 2 D(*)*D(n), (9.7) 

a 

where Z>(a)* is the conjugate transpose of Z>(a). Each Z>(a)*D(a) is a 
positive definite Hermitian form and H, as the sum of positive definite 
Hermitian forms, is a positive definite Hermitian form. Thus, there is a 
non-singular matrix P such that P*HP = I. But then 

(p- 1 D(a)P)*(p- 1 D(a)P) = P*D(*)*p*- 1 p- 1 D(*)P 
= P*D(n)*H D(a)P 



= Wj D(a)*D(b)*D(b)D(a)\p 
= P*(2 D(ba)*D(ba)]p 



= P*HP = I. 
Thus, each P _1 Z)(a)P is unitary, as we wished to show. □ 

For any matrix A = [a^], S(A) = 2f =x «« is called the trace of A. Since 
S(AB) = 2? =1 (Z? =1 a ij b h ) = ^ l (% =1 b ji a ij ) = S(BA), the trace of a 
product does not depend on the order of the factors. Thus S{P~ X AP) = 
SiAPP' 1 ) = S(A) and we see that the trace is left invariant under similarity. 
If D l {G) and D 2 (G) are equivalent representations, then SiD 1 ^)) = S(D 2 (a)) 
for all a e G. Thus, the trace, as a function of the group element, is the same 
for any of a class of equivalent representations. For simplicity we write 
S(D(a)) = S D (a) or, if the representation has an index, 5 , (D r (a)) = S r (a). 
If a and a' are conjugate elements, a' = b _1 ab, then S^Ca') = S(Z)(b -1 )D(a) 
Z)(b)) = ^(a) so that all elements in the same conjugate class of G have the 
same trace. For a given representation, the trace is a function of the conjugate 
class. If the representation is irreducible, the trace is called a character 
of G and is written 5 r (a) = # r (a)- 

Let D r (G) and Z) S (G) be two irreducible representations. Then consider 
the matrix 

T = 2 D'itr^XDX*) (9.8) 



9 | Representations of Finite Groups by Matrices 299 

where X is any matrix for which the products in question are defined. Then 
TD s (b) = 2 DVP'W^) 

a 

= D r (b) 2 D'Qr^-^XDXab) 

a 

= D r (b)T. (9.9) 

Thus, either D r (G) and D 8 (G) are inequivalent and T = for all X, or 
D r {G) and Z^CG) are equivalent. We adopt the convention that irreducible 
representations with different indices are inequivalent. If r = s we have 
T = col, where co will depend on X. Let Z) r (a) = [o^(a)], T = [f w ], and 
let X be zero everywhere but for a 1 in they'th row, fcth column. Then 

tu = K/^KOO = for r^s. (9.10) 

a 

If r = s, then co is a function of r, y, and k and we have 

'« = 1 «W» _1 K»(«) = ">* *«• ( 9 - n ) 

a 

Notice that co d jk is independent of / and /. But we also have 

hi = I «L-(a~>L(a) 

a 

= 2, tf£i(a)<4(a _1 ) 

a 
a 

where of u is independent of j and A:. Thus, we see that o) r ik d r u = co r H d ilc = 
unless k = j and / = i, in which case to r j:j — of u . But af^ is independent 
of / and co r i:j is independent of/ Thus, we may write 

Z* r ij (*- 1 )a r kl (a) = co r d il d jk . (9.13) 

a 

In order to evaluate co r set k = j, I = i, and sum overy; 

n r 

II«r/a- 1 M i (a) = "X (9-14) 



where n r is the dimension of the representation D r (G). But since 
2"=i fl ii(a -1 ) a ii( a ) is a diagonal element of the product D r (a _1 )Z) r (a) = 
Z> r (e) = 7 r we have 

n r co r = 2 iX/a- 1 )^) = 1 1 = g. (9.15) 

a i=l a 



300 Selected Applications of Linear Algebra | VI 

Thus, 

of = £ . (9.16) 

n r 

All the information so far obtained can be expressed in the single formula 

2 arX«">«00 = - <>* <>« <$«• (9-17) 

a n r 

Multiply (9.17) by a s lt (b) and sum over /. Then 

2 1 aU^aMaM = 2 ~ «A00 *« »« *r» 

i =i a 1=1 n r 

or (9.18) 

2 fl ;xa-»b) = ■£ fl^(b) a„ a„. 

a n r 

In (9.18) set / = /, t = k, and sum over/ and k: 

n T n s n T n s „ 

121 flW»">J*(ab) =22- <40>) ** *«, 

3=1 fc=i a i=l fc=l n r 

or (9.19) 

2/(a"V(ab) = ^ Z s (b)<5 rs . 
a n r 

In particular, it we take b = e, we have 

2 /(a - V(a) = - n s d rs = g d rs . (9.20) 

a n r 

Actually, formula (9.20) could have been obtained directly from formula 
(9.17) by setting i = j, I = k, and summing over/ and k. 

Let D(G) be a direct sum of a finite number of irreducible representations, 
the representation D r (G) occurring c r times, where c r is a non-negative 
integer. Then 

^(a) = 2^ r (a), 

r 

from which we obtain 

2/(a- 1 )S 2) (a) = gcv. (9.21) 

a 

Furthermore, 

2S 2) (a- 1 )S i) (a)=g2<V 2 , (9.22) 

a r 

so that a representation is irreducible if and only if 

2 SJtT^SJ*) = g. (9.23) 



9 | Representations of Finite Groups by Matrices 301 



In case the representation is unitary, ^(ar 1 ) = a r H (a), so that the relations 
(9.17) through (9.23) take on the following forms: 



a W, 



2 fl»fl«(ab) = ^ a s it (b) d jk d rs , (9.18)' 

a n r 



Zx r (a)x s (*W = -x s Q>)Srs, ( 9 - 19 )' 

a n r 



2f(*W(*) = 8*T» ( 9 - 2 °)' 



2/(a)S 1) (a) = gc r , (9.21)' 



25^)5^) = g2c r », (9-22)' 



2 S d (h)S d (a) = g if and only if D(G) is irreducible. (9.23) 

a 

Formulas (9.19)' through (9.23)' hold for any form of the representations, 
since each is equivalent to a unitary representation and the trace is the same 
for equivalent representations. 

We have not yet shown that a single representation exists. We now do 
this, and even more. We construct a representation whose irreducible 
components are equivalent to all possible irreducible representations. Let 
G = {a l9 a 2 , . . . , aj. Consider the set V of formal sums of the form 

a = X^ + #2*2 + * • • + X g*g> X i G C - ( 9 -24) 

Addition is defined by adding corresponding coefficients, and scalar multi- 
plication is defined by multiplying each coefficient by the scalar factor. 
With these definitions, V forms a vector space over C. For each a^eGwe can 
define a linear transformation on V by the rule 

a 4 (a) = SifoHi) + x 2 (*&) + • • • + *„(*&„)> (9-25) 

^ induces a linear transformation that amounts to a permutation of the 
basis elements. Since (a i a i )(a) = a^a^a)), the set of linear transformations 
thus obtained forms a representation. Denote the set of matrices representing 
these linear transformations by R(G). R(G) is called the regular representa- 
tion, and any representation equivalent to R(G) is called a regular repre- 
sentation. 



302 Selected Applications of Linear Algebra | VI 

Let R(a) = [r i; (a)J. Then r i3 (a) = 1 if aa, = a l5 and otherwise r tf (a) = 0. 
Since aa 3 = a j if and only if a = e, we have 

S R (e) = g, (9.26) 

S R (a) = for a^e. (9.27) 

Thus, 

2 /(a-^S^a) = / (e^e) = gn r , (9.28) 

a 

so that by (9.21) a representation equivalent to D r (G) occurs n r times in 
the regular representation. 

Theorem 9.10. There are only finitely many inequivalent irreducible 
representations. 

proof. Every irreducible representation is equivalent to a component 
of the regular representation. The regular representation is finite dimen- 
sional and has only a finite number of components. □ 

Furthermore, 

| S R (^)S R (a) = S R (e)S R (e) = g 2 (9.29) 

so that by (9.22) 

I n r 8 = g. (9.30) 

r 

Let C i denote a conjugate class in G and h t the number of elements in 
the class C € . Let m be the number of classes, and m' the number of in- 
equivalent representations. Since the characters are constant on con- 
jugate classes, they are really functions of the classes, and we can define 
%i = X r ( a ) f° r any a e C t . With this notation formula (9.20)' takes the form 

m 

Ih<X< r X<' = g&rv (9-31) 

Thus, the m' w-tuples (\Jh 1 x 1 r , nK%£ , . . . , yjh m x m r ) are orthogonal and, 
hence, linearly independent. This gives m' < m. 

We can introduce a multiplication in V determined by the underlying 
group relations. If a = 2f=i x i*i an< 3 ft = ^Li Vfii* we define 

= 1( I Wi)**. (9.32) 

This multiplication is associative and distributive with respect to the pre- 
viously detained addition. The unit element is e. Multiplication is not 
commutative unless G is commutative. 



9 | Representations of Finite Groups by Matrices 303 

Consider the elements 

aeC, 

y,b = J «b = b 2 b-'ab = by, (9 33) 

a.eC t aeCj. 

and, hence, y^a = ay, for all a e V. Similarly, any sum of the y f commutes 
with every element of V. 

The importance of the y, is that the converse of the above statement is 
also true. Any element in V which commutes with every element in V is a 
linear combination of the y,. Let y = J|Li c,a, be a vector such that yoc = ay 
for every a e V. Then, in particular, yb = by, or b _1 yb = y for every beG. 
If b _1 a,b is denoted by a,, we have 

r = b- 1 (ic l a i )b = ic l b- 1 a i b 

\ i=i / i=i 

9 9 

t=l 3=1 

so that Ct = Cj. This means that all basis elements from a given conjugate 
class must have the same coefficients. Thus, in fact, 

m 

7 = 2 c i7i- 
i=i 

Since y^ also commutes with every element in V, we must have 

m 

yiyi=Ic) k y k . (9.34) 

ft=0 

Now, let Ci = 2aeCj ^ r ( a )- By exactly the same argument as before, 
C* commutes with every matrix in D r (G). Thus, C/ must be a diagonal 
matrix of the form 

C t r = % r I r (9.35) 

where I r is the identity of the rth representation. But >S(Q r ) = n r r)f and 
S{Cf) = 2 a6Q ^(i)'-(a)) = h iXi r . Thus, 

/j.y r 

^^ (9-36) 

r "lXl n r i /rt T7\ 

^ = = — = 1, (9.37) 

n r n r 

where we agree that C x is the conjugate class containing the identity e. Also, 

m 

Q r C/ = l4c; (9.38) 



304 Selected Applications of Linear Algebra \ VI 

where these c) k are the same as those appearing in equation (9.34). This 
means 

m 

a;=i 

or 

m 

Vi% r = 24^- (9-39) 

In view of equation 9.36, this becomes 

n iXi "oXj _ -y A "JcXk 



n„ n„ fc=i n. 



or 



Kxl n al = n r i^khXk 

k=l 
to 

= Xi r Ic)khkXk- (9-40) 

Thus, 

m' m to' 

2 Kx!^ai = 2 C 'A 2 **■#*" 

r=l fc=l r=l 

TO 

= 2 c i k h k S R (a), where a g Q, 

k=l 

= cj lg , (9.41) 

remembering that C x is the conjugate class containing the identity. 

Suppose that C k contains the inverses of the elements of C t . Then x/ = Xk- 
Also, observe that y i y j contains the identity h t times if C, contains the inverses 
of the elements of C t , and otherwise y^j does not contain the identity. Thus 
c^ = h t ifj = k, and c) x = if j ^ k. Thus, 

2%7x/ = f *«• (9 - 42) 

r=l «j 

Theorem 9.11. The number of inequivalent irreducible representations of a 
finite group G is equal to the number of conjugate classes in G. 

proof. With m the number of conjugate classes and m' the number of 
inequivalent irreducible representations, we have already shown that m' < m. 
Formula (9.42) shows that the w'-tuples (%/, x*> • ■ • » xf) are mutually 
orthogonal. Thus m < m' , and m = m . D 

So far the only numbers that can be computed directly from the group 
G are the c\. Formula (9.39) is the key to an effective method for computing 



*il r 




T 


r\l 


= [cJJ 


T 

V2 


-VrJ- 




T 



9 | Representations of Finite Groups by Matrices 305 

all the relevant numbers. Formula (9.39) can be written as a matrix equation 
in the form 



(9.43) 



where [c) k ] is a matrix with i fixed,;' the row index, and k the column index. 
Thus, 77/ is an eigenvalue of the matrix [c) k ] and the vector (??/, r) 2 r , . . . , rj m r ) 
is an eigenvector for this eigenvalue. This eigenvector is uniquely determined 
by the eigenvalue if and only if the eigenvalue is a simple solution of the 
characteristic equation for [cjj. For the moment, suppose this is the case. 
We have already noted that rj/ = 1. Thus, normalizing the eigenvector 
so that rji = 1 will yield an eigenvector whose components are all the eigen- 
values associated with the rth representation. 

The computational procedure: For a fixed i find the matrix [c) k ] and 
compute its eigenvalues. Each of the m eigenvalues will correspond to one 
of the irreducible representations. For each simple eigenvalue, find the 
corresponding eigenvector and normalize it so that the first component is 1 . 
From formulas (9.36) and (9.31) we have 



m m% r _ v h ^xl 



i=l h 



= 1 



(9.44) 



i=i n. 



This gives the dimension of each representation. Knowing this the characters 
can be computed by means of the formula 



%i 



n r Y]i 



(9.36) 



Even if all the eigenvalues of [c* k ] are not simple, those that are may be 
used in the manner outlined. This may yield enough information to enable 
us to compute the remaining character values by means of orthogonality 
relations (9.31) and (9.42). It may be necessary to compute the matrix 
[ci k ] for another value of i. Those eigenvectors which have already been 
obtained are also eigenvectors for this new matrix, and this will simplify 
the process of finding the eigenvalues and eigenvectors for it. 

Theorem 9.12. The dimension of an irreducible representation divides the 
order of the group. 



306 Selected Applications of Linear Algebra | VI 

proof. Multiplying (9.39) by rj t r , we obtain 

TO / TO \ 

fc=i \j)=i / 

m i to \ 

= 2 244,W- (9-45) 

»=i\%=i / 

Hence, rj/rj/ is an eigenvalue of the matrix [cj fc ] [c* p ]. If C t is taken to be 

the class containing the inverses of the elements in C i5 we have 

2f%V = lf^V 

i=l «, i=l Hi 



1 ™ — g 2 

2 ^ <^» ^* == 2 



(9.46) 



£ 



Then — is an eigenvalue of the matrix 2™i t" [ C 3 -J [ c iU- All tne coefficients 

of this matrix are integers. Hence, its characteristic polynomial has integral 
coefficients and leading coefficient 1 . A rational solution of such an equation 



must be an integer. Thus, — and, hence, — must be an integer. □ 

n/ n r 

It is often convenient to summarize the information about the characters 
in a table of the form : 



D 1 
D 2 



K h 2 • • • h m 
Q Q • • • C m 


Xl X2 ' Xm 
Xl X% ' Xm 

v m v to ... v m 
Ail a2 Am 



(9.47) 



The rows satisfy the orthogonality relation (9.31) and the columns satisfy 
the orthogonality relation (9.42): 



2 KXiXi = 8 d r 

i=l 

m 

2 KxlXi = s d i 



(9.31) 



(9.42) 



If some of the characters are known these relations are very helpful in 
completing the table of characters. 

Example. Consider the equilateral triangle shown in Fig. 8. Let (123) 
denote the rotation of this figure through 120°; that is, the rotation maps P 1 



9 | Representations of Finite Groups by Matrices 

Ps 



Fig. 8 



307 




onto P 2 , P 2 onto P 3 , and P s onto P ± . Similarly, (132) denotes a rotation 
through 240°. Let (12) denote the reflection that interchanges P 1 and P 2 
and leaves P 3 fixed. Similarly, (13) interchanges P x and P 3 while (23) inter- 
changes P 2 and P 3 . These mappings are called symmetries of the geometric 
figure. We define multiplication of these symmetries by applying first one 
and then the other; that is, (123)(12) means to interchange P x and P 2 and 
then rotate through 120°. We see that (123)(12) = (13). Including the iden- 
tity mapping as a symmetry, this defines a group G = {e, (123), (132), (12), 
(13), (23)} of symmetries of the equilateral triangle. 

The conjugate classes are C x = {e}, C 2 = {(123), (132)}, and C 3 = {(12), 
(13), (23)}. It is easy to verify that 

7272 = tyi + 72, 

7273 = 273, 

7s73 = 37i + 3y 2 . 
Thus, we have [cfj is 



"0 


r 





2 


3 


3 o_ 



The eigenvalues are 0, 3, and —3. Taking the eigenvalue i?, 1 = 3, we get the 
eigenvector (1 , 2, 3). From (9.44), we get n x = 1. For the eigenvalue rj 3 2 = 
—3 we get the eigenvector (1 ,2, -3) and n 2 = 1. For the eigenvalue r? 3 3 = 0, 
we get the eigenvector (1,-1,0) and « 3 = 2. Computing the characters 
by means of (9.36)', we get the character table 

1 2 3 

Cj C 2 I-3 



D 1 


1 


1 


1 


D 2 


1 


1 


-1 


D* 


2 


-1 






308 



Selected Applications of Linear Algebra | VI 



The dimensions of the various representations are in the first column, the 
characters of the identity. The most interesting of the three possible ir- 
reducible representations is D 3 since it is the only 2-dimensional representa- 
tion. Since the others are 1 -dimensional, the characters are the elements 
of the corresponding matrices. Among many possibilities we can take 



D 3 (e) = 



Z) 3 ((12)) = 



"1 


0" 


_0 


1_ 


"0 


r 


1 






Z> 3 ((123)) = 



£> 3 ((13)) = 



"0 




-r 


1 




-i_ 


- 


1 


0" 




1 


i_ 



Z) 3 ((132)) = 



D*((23)) = 





1 


0" 


_ 


1 


1 


"1 




-r 







-i_ 



BIBLIOGRAPHICAL NOTES 

The necessary background material in group theory is easily available in G. Birkhoff 
and S. MacLane, A Survey of Modern Algebra, Third Edition, or B. L. van der Waerden, 
Modern Algebra, Vol. 1. More information on representation theory is available in V. I. 
Smirnov, Linear Algebra and Group Theory, and B. L. van der Waerden, Gruppen von 
Linearen Transformationen. F. D. Murnaghan, The Theory of Group Representations, 
is encyclopedic. 



EXERCISES 

The notation of the example given above is particularly convenient for repre- 
senting permutations. The symbol (123) is used to represent the permutation, 
"1 goes to 2, 2 goes to 3, and 3 goes to 1." Notice that the elements appearing 
in a sequence enclosed by parentheses are cyclically permuted. The symbol 
(123)(45) means that the elements of {1, 2, 3} are permuted cyclically, and the 
elements of {4, 5} are independently permuted cyclically (interchanged in this 
case). Elements that do not appear are left fixed. 

1. Write out the full multiplication table for the group G given in the example 
above. Is this group commutative? 

2. Verify that the set of all permutations described in Chapter III-l form a group. 
Write the permutation given as illustration in the notation of this section and verify 
the laws of combination given. The group of all permutations of a finite set 
S = {1, 2, ...,«} is called the symmetric group on n objects and is denoted by 6„. 
Show that S n is of order n !. A subgroup of <3 n is called a. group of symmetries , or a 
permutation group. 

3. Show that the subset of <3 n consisting of even permutations forms a subgroup 
of <B n . This subgroup is called the alternating group and is denoted by 9t n . Show 
that 5l n is a normal subgroup. 

4. For any group G and any a e G, let Z> x (a) be the 1 x 1 unit matrix, D x (a) = 
[1]. Show that D 1 (G) = {D x (si) | a G G} is a representation of G. This representa- 
tion is called the identity representation. 



9 | Representations of Finite Groups by Matrices 309 

5. S 3 be the symmetric group described in the example given above. We showed 
there that S 3 has three inequivalent irreducible representations. One of them is 
the identity representation ; another is the 2 x 2 representation which we described. 
Find the third one. 

6. Show that any 1 -dimensional representation of a finite group is always in 
unitary form. 

7. Give the 2 x 2 irreducible representation of <5 3 in unitary form. 

8. Show that a finite group G is commutative if and only if every irreducible 
representation is of dimension 1 . 

9. Show that a finite commutative group of order n has n inequivalent irreducible 
representations. 

10. Let G be a cyclic group of order n. Find the n irreducible inequivalent 
representations of G. 

1 1 . There are two non-isomorphic groups of order 4. One is cyclic. The other 
is of the form 23 = {e, a, b, c} where a 2 = b 2 = c 2 = e and ab = c, ac = b, 
be = a. 23 is called the four-group. Find the four inequivalent irreducible repre- 
sentations for each of these groups. 

12. Show that if G is a group of order/? 2 , where/? is a prime number, then G is 
commutative. 

13. Show that all groups of orders, 2, 3, 4, 5, 7, 9, 11, and 13 are commutative. 

14. Show that there is just one commutative group for each of the orders 6, 
10, 14, 15. 

15. Show that a non-commutative group of order 6 must have three irreducible 
representations, two of dimension 1 and one of dimension 2. Show that this 
information and the knowledge that one of the representations must be the identity 
representation determines five of the nine numbers that appear in the character 
table. How many conjugate classes can there be? What are their orders? Show 
that we now know enough to determine the remaining elements of the character 
table. Show that this information determines the group up to an isomorphism; 
that is, any two non-commutative groups of order 6 must be isomorphic. 

16. Show that if every element of a group is of order 1 or 2, then the group 
must be commutative. 

17. There are five groups of order 8 ; that is, every group of order 8 is isomorphic 
to one of them. Three of them are commutative and two are non-commutative. 
Of the three commutative groups, one contains an element of order 8, one contains 
an element of order 4 and no element of higher order, and one contains elements 
of order 2 and no higher order. Write down full multiplication tables for these 
three groups. Determine the associated character tables. 

18. There are two non-commutative groups of order 8. One of them is generated 
by the elements {a, b} subject to the relations, a is of order 4, b is of order 2, 
and ab = ba 3 . Write out the full multiplication table and determine the associated 
character table. An example of this group can be obtained by considering the 
group of symmetries of a square. If the four corners of this square are numbered, 



310 Selected Applications of Linear Algebra | VI 

a representation of this group as a permutation group can be obtained. The other 
group of order 8 is generated by {a, b, c} where each is of order 4, ab = c, be = a, 
ca = b, and a 2 = b 2 = c 2 . Show that ab = b 3 a = ba 3 . Write out the full multi- 
plication table for this group and determine the associated character table. Com- 
pare the character tables for these two non-isomorphic groups of order 8. 

The above exercises have given us a reservoir of representations of groups of 
relatively small order. There are several techniques for using these representations 
to find representations of groups of higher order. The following exercises illustrate 
some of these techniques. 

19. Let G x be a group which is the homomorphic image of the group G a . Let 
Z>(G X ) be a representation of G v Define a homomorphism of G 2 onto D(G^) and 
show that D(G^) is also a representation of G 2 . 

20. Consider the two non-commutative groups of order 8 given in Exercise 18. 
Show that H = {e, a 2 } is a normal subgroup (using the appropriate interpretation 
of the symbol "a" in each case). Show that, in either case, GjH is isomorphic 
to the four-group 23. Show how we can use this infromation to obtain the four 
1 -dimensional representations for each of these groups. Show how the characters 
for the remaining representation can be obtained by use of the orthogonality 
relations (9.31) and (9.42). 

21. In a commutative group every subgroup is a normal subgroup. In Exercise 
10 we determined the character tables for a cyclic group. Using this information 
and the technique of Exercise 19, find the character tables for the three commutative 
groups of order 8. 

22. Show that S n has a 1 -dimensional representation in which every element 
of <H TO is mapped onto [1] and every element not in <JI n is mapped onto [-1]. 

23. Show that if D r (G) is a representation of G of dimension n, where D r (a) = 
[flf/a)], and D S (G) is a representation of dimension m, where Z) s (a) = [a^(a)], 
then Z) rX '(Q, where D rX, (a) = K fc; ?j(a) = fl^(a)a* 4 (a)], is a representation of G 
of dimension mn. D rXS (G) is known as the Kronecker product of D r (G) and D S (G). 

24. Let S r (a) be the trace of a for the representation D r (G), S s (a) the trace for 
D S (G), and S"X"(a) the trace for D r *'(G). Show that S r * s (a) = 5 r (a)5 s (a). 

25. The commutative group of order 8, with no element of order higher than 2, 
has the following three rows in the associated character table: 

1-1 1 1-1-1 1-1 
1 1-1 1-1 1-1-1 
1 1 1-1 1-1-1 -1. 
Find the remaining five rows of the character table. 

26. The commutative group of order 8 with an element of order 4 but no element 
of order higher than 4 has the following two rows in the associated character table: 

1 1 1 1-1-1-1-1 

1 i -1 -i 1 / -1 -i. 

Find the remaining six rows of the character table. 



9 | Representations of Finite Groups by Matrices 311 

27. Let 77 and a be permutations of the set S ={1,2,..., n), and let a = tT x ott 
be conjugate to a. Show that if a'(i) =j, then a^ii)) = tt(j). Let a' be repre- 
sented in the notation of the above example in the form o' = (■■• ij ■••)••• . 
Show that a is represented in the form a = ( • • • rr{i)it(j) •■)■••. As an example, 
let a' = (123)(45) and ■* = (1432). Compute a = W 'it 1 directly, and also replace 
each element in (123)(45) by its image under n. 

28. Use Exercise 27 to show that two elements of S n are conjugate if and only 
if their cyclic representations have the same form; for example, (123)(45) and 
(253)(14) are conjugate. (Warning: This is not true in a permutation group, a 
subgroup of a symmetric group.) Show that <5 4 has five conjugate classes. 

29. Use Exercise 28, Theorems 9.11 and 9.12, and formula (9.30) to determine 
the dimensions of the irreducible representations of S 4 . 

30. Show that three of the conjugate classes of S 4 fill out <H 4 . 

31. Use Exercises 22 and 30 to determine the characters for the two 1 -dimen- 
sional representations of S 4 . Use Exercise 29 to determine one column of the 
character table for S 4 , the column of the conjugate class containing the identity 
element. 

32. Show that 33 = {e, (12)(34), (13)(24), (14)(23)} is a subgroup of S 4 iso- 
morphic to the four-group. Show that 93 is a normal subgroup of S 4 . Show that 
each coset of 93 contains one and only one of the elements of the set {e, (123), 
(132), (12), (13), (23)}. Show that the factor group S 4 /93 is isomorphic to S 3 . 

33. Use Exercises 19 and 32 and the example preceding this set of exercises to 
determine a 2-dimensional representation of <S 4 . Determine the character values 
of this representation. 

34. To fix the notation, let us now assume that we have obtained part of the 
character table for S 4 in the form: 





1 


6 


8 


6 


3 




Cx 


c 2 


c 3 


Q 


c. 


D 1 


1 


1 


1 


1 


1 


D* 


1 


-1 


1 


-1 


1 


D 3 


2 





-1 





2 


D i 


3 










Z> 5 


3 











Show that if Z) 4 is a representation of S 4 , then the matrices obtained by multi- 
plying the matrices in D 4 by the matrices in D 2 is also a representation of S 4 . 
Show that this new representation is also irreducible. Show that this representation 
must be different from D 4 , unless D 4 has zero characters for C 2 and C 4 . 

35. Let the characters of D 4 be denoted by 

Z> 4 3 a b c d. 



312 Selected Applications of Linear Algebra | VI 

Show that 

3 + 6a + 8b + 6c + 3 d = 
3 - 6a + 8b - 6c + 3d = 
6 - 8b + 6d = 0. 

Determine b and d and show that a = — c. Show that a 2 = 1. Obtain the complete 
character table for 6 4 . Verify the orthogonality relations (9.31) and (9.42) for this 
table. 

10 I Application of Representation Theory to Symmetric Mechanical Systems 

This section depends directly on the material in the previous two sections, 
8 and 9. 

Consider a mechanical system which is symmetric when in an equilibrium 
position. For example, the • ozone molecule consisting of three oxygen 
atoms at the corners of an equilateral triangle is very symmetric (Fig. 9). 
This particular system can be moved into many new positions in which it 
looks and behaves the same as it did before it was moved. For example, 
the system can be rotated through an angle of 120° about an axis through 
the centroid perpendicular to the plane of the triangle. It can also be reflected 
in a plane containing an altitude of the triangle and perpendicular to the 
plane of the triangle. And it can be reflected in the plane containing the 
triangle. Such a motion is called a symmetry of the system. The system 
above has twelve symmetries (including the identity symmetry, which is to 
leave the system fixed). 

Since any sequence of symmetries must result in a symmetry, the sym- 
metries form a group G under successive application of the symmetries as the 
law of combination. 

Let X = (x 1} . . . , x n ) be an w-tuple representing a displacement of the 
system. Let a be a symmetry of the system. The symmetry will move the 




Fig. 9 



10 I Application of Representation Theory to Symmetric Mechanical Systems 313 

system to a new configuration in which the displacement is represented by 
X'. The mapping of X onto X' will be represented by a matrix D(a); that 
is, D(a)X = X'. If a new symmetry b is now applied, the system will be 
moved to another configuration represented by X" where X" = D(b)X'. 
But since ba moves the system to the configuration X" in one step, we have 
X" = D(ba)X = D(b)D(a)X. This holds for any X so we have £>(ba) = 
D(b)D(a). Thus, the set D(G) of matrices obtained in this way is a repre- 
sentation of the group of symmetries. 

The idea behind the application of representation theory to the analysis 
of symmetric mechanical systems is that the irreducible invariant subspaces 
under the representation D(G) are closely related to the principal axes of 
the system. 

Suppose that a group G is represented by a group of linear transformations 
{<r a } on a vector space V. Let /be a Hermitian form and let g be the sym- 
metrization of/ defined by 

g(",£) = -2/(tfa(a),<Ta(£)). ( 1(U ) 

g a 

Let A = {. . . , a/, . . . , a. r nr . . .} be a basis for V such that {a/, . . . , a. r n ) 
is a basis for the irreducible subspace on which G is represented by D r (G) 
in unitary form; that is, 

n r 

tfa(a/)=2<aa)< ( 10 - 2 ) 

where D r (a) = [a r H (a)] is unitary. Then, by (9.17)', 

g a 



= -lf 
g a 



3=1 fc=l 



= -22 I«Ii(a»)/(a>/) 

g a 3=1 fc=i 



-laWaKiM /«.«*') 



= 2 2 

= 2 2-MA./K.O- d°- 3 ) 

If there is at most one invariant subspace corresponding to each irreducible 
representation of G, the matrix representing g with respect to the basis A 
would be a diagonal matrix. If a given irreducible representations occurs 
more than once as a component of D(G), then terms off the main diagonal 
can occur, but their appearance depends on the values of /(a/, a fc s ). If/ is 



314 Selected Applications of Linear Algebra | VI 

left invariant under the group of transformations — that is,/(cr a (a), <r a )(/?)) = 
/(a, /?) for all a e G — then g =/and the same remarks apply to/. 

By a symmetry of a mechanical system we mean a motion which preserves 
the mechanical properties of the system as well as the geometric properties. 
This means that the quadratic forms representing the potential energy and 
the kinetic energy must be left invariant under the group of symmetries. 
If a coordinate system is chosen in which the representation is unitary and 
decomposed into its irreducible components, considerable progress will be 
made toward finding the principal axes of the system. If D(G) contain each 
irreducible representation at most once, the decomposition of the repre- 
sentation will yield the principal axes of the system. If the system is not very 
symmetric, the group of symmetries will be small, there will be few in- 
equivalent irreducible representations, and it is likely that the reduced form 
will fall short of yielding the principal axes. (As an extreme case, consider 
the situation where the system has no symmetries except the identity.) How- 
ever, in that part of the representing matrices where r ^ s the terms will be 
zero. The problem, then, is to find effective methods for finding the basis A 
which reduces the representation. 

The first step is to find the irreducible representations contained in D{G). 
This is achieved by determining the trace S D for D and using formula (9.21)'. 
The trace is not difficult to determine. Let C/(a) be the number of particles 
in the system left fixed by the symmetry a. Only coordinates attached to 
fixed particles can contribute to the trace S D (a). If a local coordinate system 
is chosen at each particle so that corresponding axes are parallel, they will 
be parallel after the symmetry is applied. Thus, each local coordinate system 
(at a fixed point) undergoes the same transformation. If the local coordinate 
system is Euclidean, the local effect of the symmetry must be represented 
by a 3 x 3 orthogonal matrix since the symmetry is distance preserving. 
The trace of a matrix is the sum of its eigenvalues, since that is the case when 
it is in diagonal form. The eigenvalues of an orthogonal matrix are of 
absolute value 1. Since the matrix is real, at least one must be real and the 
others real or a pair of complex conjugate numbers. Thus, the local trace is 

±1 + e " + e ~ ie = ±1 + 2 cos 6, 

and (10.4) 

S D (*)= t/(a)(±l + 2cos0). 

The angle d is the angle of rotation about some axis and it is easily determined 
from the geometric description of the symmetry. The +1 occurs if the 
symmetry is a local rotation, and the —1 occurs if a mirror reflection is 
present. 

Once it is determined that the representation D r (G) is contained in D(G), 
the problem is to find a basis for the corresponding invariant subspace. 



10 I Application of Representation Theory to Symmetric Mechanical Systems 315 
If {a/, . . . , a^} is the required basis, we must have 

«)=IaUa)<. ( 10 - 5 ) 

3=1 

If a/ is represented by X t (unknown) in the given coordinate system, then 
cr a (a/") is represented by D(n)Xi. Thus, we must solve the equations 

D(*)Xi = 2 a r MX„ i = 1, . . . , n„ (10.6) 

3=1 

simultaneously for all a e G. The a r H (a) can be computed once for all and 
are presumed known. Since each X t has n coordinates, there are n • n r 
unknowns. Each matric equation of the form (10.6) involves n linear equa- 
tions. Thus, there are g • n • n r equations. Most of the equations are re- 
dundant, but the existence or non-existence of the rth representation in 
D{G) is what determines the solvability of the system. Even when many 
equations are eliminated as redundant, the system of linear equations to be 
solved is still very large. However, the system is linear and the solution can 
be worked out. 

There are ways the work can be reduced considerably. Some principal 
axes are obvious for one reason or another. Suppose that Y is an n-tuple 
representing a known principal axis. Then any other principal axis repre- 
sented by X must satisfy the condition 

Y T BX = (10.7) 

since the principal axes are orthogonal with respect to the quadratic form B. 
There is also the possibility of using irreducible representations in other 
than unitary form in the equations of (10.6). The basis obtained will not 
necessarily reduce A and B to diagonal form. But if the representation uses 
matrices with integral coefficients, the computation is sometimes easier, 
and the change to an orthonormal basis can be made in each invariant 
subspace separately. 

BIBLIOGRAPHICAL NOTES 

There are a number of good treatments of different aspects of the applications of group 
theory to physical problems. None is easy because of the degree of sophistication required 
for both the physics and the mathematics. Recommended are: B. Higman, Applied 
Group-Theoretic and Matrix Methods; J. S. Lomont, Applications of Finite Groups. 
T. Venkatarayudu, Applications of Group Theory to Physical Problems; H. Weyl, Theory 
of Groups and Quantum Mechanics; E. P. Wigner, Group Theory and Its Application to the 
Quantum Mechanics of Atomic Spectra. 



316 Selected Applications of Linear Algebra | VI 



EXERCISES 

The following exercises all pertain to the ozone molecule described at the 
beginning of this section. However, to reduce the complexity of analyzing this 
system we make a simplification of the problem. As described at the beginning 
of this section, the phase space for the system is of dimension 9 and the group of 
symmetries is of order 12. This system has already been discussed in the exercises 
of Section 8. There we assumed that the displacements in a direction perpendicular 
to the plane of the triangle could be neglected. This has the effect of reducing the 
dimension of the phase space to 6 and the order of the group of symmetries to 6. 
This greatly simplifies the problem without discarding essential information, 
information about the vibrations of the system. 

1. Show that if the ozone molecule is considered as embedded in a 2-dimensional 
space, the group of symmetries is of order 6 (instead of 12 when it is considered as 
embedded in a 3-dimensional space). Show that this group is isomorphic to 6 3 , 
the symmetric group on three objects. 

2. Let (x lf y x , x 2 , y 2 , x z , y z ) denote the coordinates of the phase space as illus- 
trated in Fig. 6 of Section 8. Let (12) denote the symmetry of the figure in which 
P x and P 2 are interchanged. Let (123) denote the symmetry of the figure which 
corresponds to a counterclockwise rotation through 120°. Find the matrices 
representing the permutations (12) and (123) in this coordinate system. Find 
all matrices representing the group of symmetries. Call this representation 
D{G). 

3. Find the traces of the matrices in D(G) as given in Exercise 2. Determine 
which irreducible representations (as given in the example of Section 9) are con- 
tained in this representation. 

4. Show that since D(G) contains the identity representation, 1 is in eigenvalue 
of every matrix in D{G). Show that the corresponding eigenvector is the same for 
every matrix in D{G). Show that this eigenvector spans the irreducible invariant 
subspace corresponding to the identity representation. On a drawing like Fig. 7, 
draw the displacement corresponding to this eigenvector. Give a physical inter- 
pretation of this displacement. 

5. There is one other 1-dimensional representation in D(G). Show that for a 
1 -dimensional representation a character value is also an eigenvalue. Determine 
the vector spanning the irreducible invariant subspace corresponding to this 
representation. Draw the displacement represented by this eigenvector and give 
a physical interpretation of this displacement. 

6. There are two 2-dimensional representations. One of them always corre- 
sponds to an invariant subspace that always appears in this type of problem. It 
corresponds to a translation of the molecule in the plane containing the molecule. 
Such a translation does not distort the molecule and does not play a role in deter- 
mining the vibrations of the molecule. Find a basis for the irreducible invariant 
subspace corresponding to this representation. 



10 I Application of Representation Theory to Symmetric Mechanical Systems 317 

7. In the previous exercises we have determined two 1 -dimensional subspaces 
and one 2-dimensional subspace of the representation space for D(G). There 
remains one more 2-dimensional representation to determine. At this stage the 
easiest thing to do is to use the orthogonality relations in formula (10.7) to find a 
basis for the remaining irreducible invariant subspace. Find this subspace. 

8. Consider the displacement vectors {| 5 = (0, 1, ^3/2, -J, - V3/2, -|), 
£ 6 = ( - V3/2, -i, 0, 1 , V3/2, -!)}. Show that they span an irreducible invariant 
subspace of the phase space under D(G). Draw these displacements on a figure 
like Fig. 7. Similarly, draw -(£ 5 + l 6 ). Interpret these displacements in terms 
of distortions of the molecule and describe the type of vibration that would result 
if the molecule were started from rest in one of these positions. 

Note: In working through the exercises given above we should have seen that 
one of the 1 -dimensional representations corresponds to a rotation of the molecule 
without distortion, so that no energy is stored in the stresses of the system. Also, 
one 2-dimensional representation corresponds to translations which also do not 
distort the molecule. If we had started with the original 9-dimensional phase space, 
we would have found that six of the dimensions correspond to displacements 
that do not distort the molecule. Three dimensions correspond to translations in 
three independent directions, and three correspond to rotations about three 
independent axes. Restricting our attention to the plane resulted only in removing 
three of these distortionless displacements from consideration. The remaining 
three dimensions correspond to displacements which do distort the molecule, 
and hence these result in vibrations of the system. The 1 -dimensional representation 
corresponds to a radial expansion and contraction of the system. The 2-dimensional 
representation corresponds to a type of distortion in which the molecule is expanded 
in one direction and contracted in a perpendicular direction. 



Appendix 



A collection of 
matrices with 
integral elements 
and inverses with 
integral elements 



Any numerical exercise involving matrices can be converted to an equivalent 
exercise with different matrices by a change of coordinates. For example, 
the linear problem 

AX = B (A.1) 



is equivalent to the linear problem 

A'X = B' 



(A.2) 



where A' = PA and B' = PB and P is non-singular. The problem (A.2) even 
has the same solution. The problem 

A"Y=B (A.3) 

where A" = AP has Y = P _1 Zas a solution if X is a solution of (A.l). 

It should be clear enough how these modifications can be combined. 
Other exercises can be modified in a similar way. For this purpose it is most 
convenient to choose a matrix P that has integral elements (integral matrices). 
For (A.3), it is also desirable to require P _1 to have integral elements. 

Let P be any non-singular matrix, and let D be any diagonal matrix of the 
same order. Compute A = PDP- 1 . Then D = P~ X AP. A is a matrix similar 
to the diagonal matrix D. The eigenvalues of A are the elements in the main 
diagonal of D, and the eigenvectors of A are the columns of P. Thus, by 
choosing D and P appropriately, we can find a matrix A with prescribed 
eigenvalues (whatever we choose to enter in the main diagonal of D) and 
prescribed eigenvectors (whatever we put in the columns of P). If P is orthog- 
onal, A will be orthogonal similar to a diagonal matrix. If P is unitary, A 
will be unitary similar to a diagonal matrix. 

It is extremely easy to obtain an infinite number of integral matrices with 
integral inverses. Any product of integral elementary matrices will be integral. 

319 



320 Matrices and Inverses with Integral Elements | Appendix 

If these elementary matrices have integral inverses, the product will have an 
integral inverse. Any elementary matrix of Type III is integral and has 
an integral inverse. If an elementary matrix of Type II is integral, its inverse 
will be integral. An integral elementary matrix of Type I does not have an 
integral inverse unless it corresponds to the elementary operation of multiply- 
ing by ±1. These two possibilities are rather uninteresting. 

Computing the product of elementary matrices is most easily carried out 
by starting with the unit matrix and performing the corresponding elementary 
operation in the right order. Thus, we avoid operations of Type I, use only 
integral multiples in operations of Type II, and use operations of Type III 
without restriction. 

For convenience, a short list of integral matrices with integral inverses is 
given. In this list, the pair of matrices in a line are inverses of each other 
except that the inverse of an orthogonal, or unitary, matrix is not given. 





P 










P 


-i 




"2 


r 




" 3 


-r 






5 


3_ 




-5 


2_ 






"5 


8" 




' 5 


-8" 






3 


5_ 




-3 


5_ 






"3 - 


-8" 




"-11 


8" 






4 - 


11_ 




_ -4 


3_ 






3 


8" 




" 11 


-8" 






4 


11_ 




-4 


3_ 






"4 3 


2" 




"-1 


-1 4" 




3 5 


2 




-1 


2 




_2 2 


1 




4 


2 -11_ 


"2 -5 


5" 




" 43 


-5 -25" 


2 -3 


8 




10 


-1 -6 


_3 -8 


7_ 




_-7 


1 4 _ 


P 




p-x 


10 -6 


3" 




" 1 


-3" 


8 -3 


2 




-2 


1 4 


3 


-2 




1_ 




_-7 


2 


!8_ 



Appendix | Matrices and Inverses with Integral Elements 



321 







P 














p-i 










"2 


-5 




5" 






" 43 


-5 - 


-25 






2 


-3 




8 






10 


-1 


-6 






3 


-8 




7_ 






-7 


1 


4 






"2 


5 




8" 






"-73 43 


-25" 






4 


5 


13 






-17 10 


-6 






1 


-6 


— 


1_ 






29 -17 


10. 






"1 


2 


3" 






"-2 





r 






2 


3 


4 









3 - 


-2 






3 


4 


6_ 






1 


-2 


1_ 




" 4 3 


2 




0" 






' 2 


-1 


1 0" 


5 4 


3 











-1 


- 


-2 


-2 -2 


-1 











-2 


2 


1 


_ 11 6 


4 




1_ 






-8 


3 - 


-3 1_ 




" 2 


1 




0" 






' 1 


1 


1" 






-1 







1 






-1 


-2 - 


-2 









-1 




1_ 






1 


2 


1_ 






" 2 


-1 




0" 






"1 


i r 








-1 







1 






1 . 


2 2 











1 




1_ 






1 


2 1. 






" 4 3 


2 




r 






" : 


2 -1 


1 


5 4 


3 




i 








7 -3 


1 -1 


-2 -2 


-1 


— 


i 






-10 5 


-2 1 


_ 11 6 


4 




3_ 








8 3 


-3 1 








Orthogor 


lal 












1 


"3 


_z 


r 












5 


.4 




J_ 












"1 


2 


2" 










1 

3 


2 - 


-2 


1 
















2 




1 


-2 













322 



Matrices and Inverses with Integral Elements | Appendix 
Orthogonal 
"-2 1 -2" 



1 

27 



1 -2 -2 

-2 -2 1. 

12 2" 

-2 -1 2 

2-2 1_ 

_7 _4 _4" 

4 1 -8 



4 -i 



-23 
10 



10 



1 
10 



1 
87 



10 -2 
17 



25 -2 
25 



-56 

-56 

"2 



56 


56" 


49 


-32 


32 


49_ 



1 

33 



6 

9 
-17 

4 

28 



6 

7 

-6 

4 

-32 
7 



28" 

7 

16 



1 
26 



Unitary 
"4 3/~ 

-i -1 

■ 7/ 7 
7 + 17i -17 + li ' 

17 + 7/ 7 - 17/ 





1 


"4 3i" 




5 


_3» 4 . 




1 


"7 + / - 1 + ir 


10 


_i + 


7/ 7 


— i 



Appendix | Matrices and Inverses with Integral Elements 

Unitary 
"cos 6 i sin ( 

i sin cos 6 _ 

1 ["(1 + i)e~ ie (1 - i)e- ie ~ 

2 _(i + i)e i0 (1 - i)e i0 _ 



323 



(0 real) 



Answers 
to selected 
exercises 



1-1 

1^. If / and g are (continuous, integrable, differentiable m times, satisfy the 
differential equation) and a is a constant, then f + g and af also are con- 
tinuous, etc.). With the exception of A\ and B\, any vector space axiom 
which is satisfied in a set is satisfied in any subset. 

5. B\ is not satisfied if a is negative. 

6. a = (a- x a)a = a'^aoC) = a^O =0. 

9 . („)(_i, -2, 1,0); » (5, -8, -1,2); (c) (6, -15,0,3); (rf)(-5, -1,3, 

-1). 



1-2 

1. 3/>! + 2/» 8 - 5p 3 + 4/? 4 = 0. 

2. The set of all polynomials of degree 2 or less, and the zero polynomial. 

3. Every subset of three polynomials is maximal linearly independent. 

6. No polynomial is a linear combination of the preceding ones. 

7. 1 cannot be expressed as a linear combination of polynomials divisible by x - 1 . 

8. (a) dependent; (b) dependent; (c) independent. 



1-3 

3. {(1, 2, -1, 1), (0, 1, 2, -1), (1, 1, 0, 0), (0, 0, 1, 1)}. 

4. {(1, 2, 3, 4), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0)}, for example. 



1-4 

1. (a), (b), and (d) are subspaces. (c) and (e) do not satisfy A\. .-.„., 

3. The equation used in (c) is homogeneous. This condition is essential if A\ 
and B\ are to be satisfied. Why? 

325 



326 Answers to Selected Exercises 

5. (2, -1, 3, 3) = -(1, 1, 0, 0) + 3(1, 0, 1, 1), (0, 1, -1, -1) = (1, 1, 0, 0) - 
(1, 0, 1, 0); (1, 1, 0, 0) = |(2, -1, 3, 3) + f(0, 1, -1, -1), (1, 0, 1, 1) = 
-(2 —1 3 3) + -(0 1 —1, —1). 

6. f(l,' 1, l' l', 1), (l! 0,' 1,' 0, 1), (0, 1, 1, 1, 0), (2, 0, 0, 1, 1)} is a basis for VV. 

8. W x nW 2 = <(-l, 2, 1, 2)>, W 1 = <(-l, 2, 1, 2), (1, 2, 3, 6)>, W 2 = <(-l, 
2, 1,2), (1, -1, 1, 1)>, W x + W 2 = <(-l, 2, 1, 2), (1, 2, 3, 6), (1, -1, 1, 1)>. 

9. W x n W 2 is the set of all polynomials divisible by (x — l)(x — 2). W 1 + VV 2 = 
P. 

11. If W x £ W 2 and W 2 £ W l5 there exist c^e Wj - VV 2 and <x 2 e W 2 - W x . 
a i + a 2 ^ W x since a x e VV X and a 2 ^ W x . Similarly, a. x + a 2 ^ W 2 . Since 
x i + a 2^iU W 2 , VVj^ (j W 2 is not a subspace. 

12. {(1, 1, 1, 0), (2, 1, 0, 1)} is a basis for the subspace of solutions. 

16. Since W 1 <= VV, W 1 + (VV n VV 2 ) c VV. Every a 6 W n (W x + VV 2 ) can be 
written in the form a = a x + a 2 where a x e W x and <x 2 e VV 2 . Since W x <= VV, 
a x e VV. Thus a 2 e VV and a E (VV n W^ + (VV n VV 2 ) = W x + (VV n W 2 ). 
Thus VV = VV n (W x + W 2 ) = Wj + (VV n VV 2 ). Finally, it is easily seen 
that this last sum is direct. 



II-l 

2. {o x + ffgXOa* X 2» = ( x l + X 2, ~ X 1 - X 2)> ^l^CO^l. ^2)) = (-^2. _a; l)» 

ffa^CCa?!, » 2 )) = (a^i)- 

4. The kernel of a is the set of solutions of the equations given in Exercise 12 of 
Section 4, Chapter I. 

5. {(1, 0, 1), (0, 1, -2)} is a basis of a(U). {(-4, -7, 5)} is a basis of K(a). 
7. See Exercise 2. 

12. By Theorem 1.6, P (<x) = dim V = dim {K(a) n V'} + dim t(V') < dim K{r) + 
dim tg{U) = v{t) + p(tct). 

13. By Exercise 12, p (to) > P (a) - v( T ) = p(ff) - (m - p(r)) = p(<r) + />(t) - 
w. The other part of the inequality is Theorem 1.12. 

14. By Exercise 13, v(t<j) = n — p(to) < n — (p(cr) 4- p(t) — m) = (n — p(a)) + 
(m — p(r)) = v(o) + v{t). v{ra) < n since K(tg) <= L/. The inequality 
v(o) < v(to) follows from the fact that K(a) <= K(ra). Since t<t(L/) e 
t(V) we have p(ror) < p (t). Thus v(t<t) = n — p(to) > n — p(-r) = n — m + 

Kt). 

15. By Exercise 14, v(to) = v(a). 

17. By Exercise 14, v{tg) = i>(t). 

18. p(o x + <r 2 ) = dim {a x + o 2 )(U) < dim {^(U) + a 2 (U)} = dim ^(U) + dim or^U) - 
dim (0 X (U) n 2 (U)} < P {a x ) + P {a 2 ). 

19. Since p(a 2 ) = p{ — a 2 ), Exercise 18 also says that p{a x — a 2 ) <; P {a x ) + p{o 2 ). 
Then P {a x ) = P (a 2 - (a x + a 2 )) < P {o 2 ) + P {o 1 + a 2 ). By symmetry, P (o 2 ) < 
p(°i) + p( ff i + <T 2)- 



II-2 

r 37" 

2. 



Answers to Selected Exercises 



327 



3. AB = 



_4 _4 _4 -4" 

4 4 4 4 



0J, 



BA = 



4 

12 

-4 

-12 



6 

14 

-6 

-14 



5 

13 

-5 

-13 



5" 

13 

-5 

-13 



Notice that AB and BA have different ranks. 
















" 3 -1" 
_-l 2 - 


5. 


" 3 -r 

_-5 2_ 




6. 


r 2 1_ i 

5 5 

1 3 

L5 5J 




(«) 


"-I 0" 
_ -1. 


(6) 


- o -r 


j 


"-1 0" 
1. 


(c) 


" 5 _1-| 
2 2 

1 5 
. 2 2J 


J 


"2 
_0 


(«0 


"i r 




(* 


) 


5 12-1 
13 13 

12 5 

13 13J 




(/) 


r2 

3 

J 


in 

3 

1 
3_ 


, 


"1 

_0 


0" 
0_ 



a 



9. (a) Reflection in the line y = x. See 8(6). (6) Projection onto %-axis parallel 
to x 2 -axis, followed by 90° rotation counterclockwise. See 8(/) and 8(e). 
(c) Projection onto a^-axis parallel to line x 1 + x 2 = 0. See 8(/). (d) Shear 
parallel to x 2 -axis. See $(d). (e) Stretch » x -axis by factor b, stretch » 2 -axis by 
factor c. See 8(c). (/) Rotation counterclockwise through acute angle 
6 = arccos -f . 
10. (a) y =x (onto, fixed), (6) none, (c)x 2 = (onto, fixed), x t + x 2 = (into), 
(</) Xl = (onto, fixed), (e) x 2 = (onto, stretched by factor b), x 1 = (onto, 
stretched by factor a), (/) none. 
-1 1 



11. 



1 

1 1 



12. If {d x , d 2 ,..., d n } is the set of elements in the diagonal of D, then multiplying 
A by D on the right multiplies column k by d k and multiplying A by D on the 
left multiplies row k by d k . 

13. A is a diagonal matrix. 

Fa b~ 

15. Let /be represented by 

16. The linear transformation, and thereby the function /(a; + yi), can be repre- 
~a -K 



sented by the matrix 
17. For example, A = 



b 

1 o- 





B 



-n 



II-3 

2. A 2 = I. 

3. (0, 0, 0). See Theorem 3.3. 



5 

3. 
5-J' 



6. 



o -r 

-1 o 



"1 -1 

1 





Answers to Selected Exercises 





-1 0" 







1 




1 








328 

o o r 

-10 

.010. 

9. If a is an automorphism and S is a subspace, o(S) is of dimension less than or 
equal to dim S. Since K(a) is of dimension zero in V, it is of dimension zero 
when confined to S. Thus dim o(S) = dim S. 



10. For example, A = 







ri on 


"1 o o- 

1 


B = 


1 






Lo oj 



II-4 



1 


-2 


-1 


1 


-1 


1 


1 


1 


1 



5. P = 



"0 


1 


r 


1 





l 


_1 


1 


o_ 



p-i =2 







" 


i 


V31 




3. 


2 

-V3 


2 
1 






L 2 


2 . 


-1 


1 


r 






1 


-1 


l 






1 


1 


-l 







6. PQ is the matrix of transition from A to C. The order of multiplication is 
reversed in the two cases. 

II-5 

1. (a), (c), {d). 

2. (a) 3, (b) 3, (c) 3, (d) 3, (<?) 4. 

3. Both A and 5 are in Hermite normal form. Since this form is unique under 
possible changes in basis in R 2 , there is no matrix Q such that B = Q~ 1 A. 



II-6 

1. (a) 2, (b) 2, (c) 3. 

2. (a) Subtract twice row 1 from row 2. (b) Interchange row 1 and row 3. 
(c) Multiply row 2 by 2. 

3. (From right to left.) Add row 1 to row 2, subtract row 2 from row 1, add 
row 1 to row 2, multiply row 1 by — 1 . The result interchanges row 1 and row 2. 

"1 



5. (a) 



6. (a) 



(b) 



(b) 



-1 

1 2 






3 

-2 



1 

-2 
1 



Answers to Selected Exercises ^^ 



II-7 

1. For every value of x z and » 4 , (x lt x 2 , x z , z 4 ) = x z (\, 1,1,0)+ a? 4 (2, 1, 0, 1) is 
a solution of the system of equations. Since the system of homogeneous 
equations is of rank 2 and these two solutions are linearly independent, they 
span the space of solutions. 

2. <(-l,2, 1,0>. t . 

3. (a) (2, 3, 0, 0) + <(1, -2, 1, 0), (-1, 1, 0, 1)>, (b) no solution. 

4. (3, 5, 0) + <(5, 7, 1)>. 

5. (-3,2,0,0,1) + ((-3,0,1,0,0), (1,2,0, 1,0)>. 

6. If the system has m equations and n unknowns, the augmented matrix is 
m x (n + 1). The reduced form of the augmented matrix contains the reduced 
form of the coefficient matrix in the first n columns. Their ranks are equal 
if and only if the last column of the reduced form of the augmented matrix does 
not start a non-zero row. 



II-8 

2. {(1, 0, 0, 0, 1), (0, 1, 0, 0, 1), (0, 0, 1, 0, 0), (0, 0, 0, 1, -1)}. 

3. {(1, 0, 2, -f), (0, 1, 0, f)} is a standard basis of the subspace spanned by the 
first set and {(1 ,0,1, 0), (0, 1 , 0, f )} is a standard basis for the second. Hence 
these subspaces are not identical. 

4. {(1,0, 0, 1), (0, 1,0, -f), (0, 0, 1, i)} is a standard basis of W 1 + VV 2 . 

5. {(1, 0, -1,0, 2), (0, 1, 2, 0, 1), (0, 0, 0, 1, 1)} is a stand basis. x x - 2x 2 + 
x s = 0, -2^! - x 2 - x 4 + x 5 = is a characterizing system. 

6. W x = <(£, 0, h 1), (-1, 1,0, 0)>, £! = <(!, 1,0, -|), (0,0, 1, -£)>, W 2 = 
<(-2, 3, 0, 1), (3, -4, 1, 0)>, E 2 = <(1, 0, -3, 2), (0, 1, 4, -3)>, E x + £ 2 = 
<(1, 0, 0, |), (0, 1,0, -1), (0, 0, 1, -|)>, W 1 nW 2 = <(-i 1, i 1)>, W x = 
<(-!, 1,1, 1), (0, 0, 1, -1)>, W 2 = <(-!, 1, h 1), (0, 1, 4, -3)), W x + W 2 = 
<(-i 1,1, 1), (0,1,4, -3), (0,0, 1, -!)>. 



III-l 

(1 2 3\ /l 2 3\ 
1,1 J , and identity permutation are even. 

3. Of the 24 permutations, eight leave exactly one object fixed. They are per- 
mutations of three objects and have already been determined to be even. 
Six leave exactly two objects fixed and they are odd. The identity permutation 
is even. Of the remaining nine permutations, six permute the objects cyclically 
and three interchange two pairs of objects. These last three are even since they 
involve two interchanges. 

4. Of the nine permutations that leave no object fixed, six are of one parity and 
three others are of one parity (possibly the same parity). Of the 15 already 
considered, nine are known to be even and six are known to be odd. Since 
half the 24 permutations must be even, the six cyclic permutations must be 
odd and the remaining even. 

5. 77 is odd. 



330 



Answers to Selected Exercises 



III-2 

1. Since for any permutation n not the identity there is at least one / such that 
tt(/) < /, all terms of the determinant but one vanish. |det A\ = Y\.l=x a a- 

2. 6. 

4. (a) -32, (b) -18. 



III-3 

1. 145; 134. 

2. -114. 

4. By Theorem 3.1, A -A = (det A)I. Thus det A -det A = (det A) n . If det A?±0, 
then det A = (det A)"' 1 . If det .4=0, then A ■ A = 0. By (3.5), ^ a«^w = 
for each k. This means the columns of A are linearly dependent and det A = 
= (det A) n ~\ 

5. If det A 5* 0, then /? -1 = A/det A and /? is non-singular. If det A = 0, then 
det A = and /? is singular. 



2/i 






= I^(-D n+i - 1 det J B i , 
1=1 



where -6 4 is the matrix obtained by crossing out row / and the last column, 

n I n \ 

i=i u=i J 

where Q,- is the matrix obtained by crossing out column y and the last row of B t , 

= i i^(-i) 2n+ ^- 3 (-i) z+ Mo- 

«'=ii=i 

n n 

= -I IM^= -r T ^. 
i=i j+i 



III-4 



2. 
3. 


—X 3 
— X 3 


+ 2x 2 

+ 6x 2 


+ 5a: 
- 11a 


- 6. 

+ 6. 




'0 





- 


-6" 




4 


1 








1 




*T. 





1 


- 


-2 












1 - 


-3 





5. If A 2 + A + I = 0, then A(-A - I) = I so that -A - I = A~ 



Answers to Selected Exercises 



331 



6. If A is a real 3x3 matrix its characteristic polynomial/^) is real of degree 3. If 
A satisfies a; 2 + 1 = 0, the minimum polynomial would divide x 2 + 1 and could 
not have as an irreducible factor the real factor of degree one which /(x) must 
have. 

~~ 8. x 2 — 81 is the minimum polynomial. 

1 -1 . 



III-5 

1. If I is an eigenvector with eigenvalue 0, then I is a non-zero vector in the 
kernel of a. Conversely, if a is singular, then any non-zero vector in the kernel 
of a is an eigenvector with eigenvalue 0. 

2. If <r(!) = A£, then o a (f) = <r(A|) = A 2 f. Generally, cr»(f) = A w |. 

3. If <r(f) = A x | and r(f) = A 2 |, then (<r + t)(|) = <r(f) + t(|) = A x f + A 2 | = 
(A x + A 2 )£. Also, (fl<r)(f) = flff(f) = fl^f. 



"1 0" 


and 


"2 2' 


_2 2. 




_o i_ 



4. Consider, for example, 

5. o(A) is an eigenvalue of p(a) corresponding to f. 

6. If <r(f) = A|, then £ = ff -1 ^) = ^~ X (D so that f is an eigenvector of a x 
corresponding to A -1 . 

7. Let {Ij, . . . , !„} be a linearly independent set of eigenvectors with eigenvalues 
{A l5 . . . , Aj. Then *(£, ^) = J, <r(S<) = £< ^i- But since J, f< is also an 
eigenvector (by assumption), we have <x(2t £*) = ^ 2* £*"• Tnus .2* (**' ~ 
A)£ f = 0. Since the set is linearly independent, A; - A = for each i. 

8. x" is the characteristic polynomial and also the minimum polynomial. An 
eigenvector p(x) must satisfy the equation D(p(x)) = kp(x). The constants 
are the eigenvectors of D. 

9. c is the corresponding eigenvalue. 

11. If l x + £ 2 is an eigenvector with eigenvalue A, then A(| x + l 2 ) = a (h + £2) = 
Vi + Va- Since (A - A^ + (A - A 2 )£ 2 = Owe have A - A x = A - A 2 = 0. 

12. If £ = 2<=i fl <£f is an eigenvector with eigenvalue A, then A^^a^ = A£ = 

*(£) = 2J=i«< ff (W = 2<=i W<- Then SUW* - W< = and a t {X - 
A 2 ) = for each i. Since the A t are distinct at most one of the A - A, can be zero. 
For the other terms we must have a< = 0. Since f is an eigenvector, and hence 
non-zero, not all a t = 0. 



III-6 

1. -2, (1, -1); 7, (4, 5). 2. 3 + 2i, (1,0; 3 - 2i, (1_, -/). 

3. -3, (1, -2); 2, (2, 1). 4. 2, (^2, -1); 3, (1, -Jl). 

5. 4, (1,0, 0); -2, (3, -2, 0); 7, (24, 8, 9). 

6. 1, (-1,0,1); 2,(2, -1,0); 3,(0,1, -1). 

7. 9,(4,1, -1); -9,(1, -4,0), (1,0,4). 

8. l,(i\l,0); 3, (1,1,0), (0,0, 1). 



III-7 

1. For each matrix A, P has the components of the eigenvectors of A in its 
columns. Every matrix in the exercises of Section 6 can be diagonalized. 

2. The minimum polynomial is (x — l) 2 . 



332 Answers to Selected Exercises 

3. Let a be the corresponding linear transformation. Since a ^ 1, there is a 
non-zero vector £ x such that a(^ == £ 2 ^ ^. | 2 ^ since cr is non-singular. 
If {£i> £2} is a basis, then the matrix representing a with respect to this basis 
has the desired form. On the other hand, suppose that for every choice of 
£i> (£i> £2} i s dependent. Then £ 2 is a multiple of £ x and every vector is an 
eigenvector. By Exercise 7 of Section 5 this is impossible. 

4. A~\AE)A = BA. 

5. If ttj and v 2 are projections of the same rank k, we are asked to find a non- 
singular linear transformation a such that <r*-ir x o = tt 2 . Let {a l5 . . . , a„} be a 
basis such that ^(a^) = «.. for / <; k and ^(a^) = for / > k. Let {fa, ... , 
/?„} be a basis having similar properties with respect to tt 2 . Define a by the 
rule o(fa) = a,. Then a^n^fa) = a^ir^a,) = <r-\et t ) = fa for 1 < k, and 
f^iPi) = ff -1 ^i(«i) = ff -1 (0) = for 1 > k. Thus a^^a = tt 2 . 

6. By (7.2) of Chapter II, TrC^iM) = TriBAA' 1 ) = Tr(5). 

IV-1 

1. (a) [1 1 1]; (c)[V2 0]; (</) [-$ 1 0]. (b) and (<?) are not linear 
functionals. 

2. («){[1 0], [0 1 0], [0 1]}; (6){[1 -1 0], [0 1 -1], 
[0 1]}; (c){[i i -*],[-* * -*],[* I i». 

5. If a = 2? =1 *<«<, then ^(a) = ^Xi *<M a i) = «y. 

6. Let A = {a x , . . . , a„} be a basis such that 04 = a and a 2 = /?. Let >\ = 
tA> • • • » 0n} be the dual basis. Then ^(a) = 1 and ^ X (/S) = 0. 

7. Let /?(x) = x. Then <*<,(») # <* 6 0«). 

8. The space of linear functionals obtained in this way is of dimension 1 , and 
P n is of dimension n > 1. 

9- /» = 2£-i fn;=i (* - «*>) = 2u **<*)• /'k> = w. 

10. ^CQU ****(*)) = 2*-i W**(*» = 2U**y7j£) "„<**<*)) = Ay If 

2*=x bkhii( x ) = 0» then ^(0) = 6, = 0. Thus {/*i(a), . . . , /*„(#)} is linearly inde- 
pendent. Since Oi{hj(x)) = 8 ijf the set {cr^ . . . , a n } is a basis in the dual space. 

11. By Exercise 5,p(x) = 2*=i ^(pO*))*^*) = 2iU 7^ **(*)• 

12. Let {a l5 . . . , a r } be a basis of W. Since <x ^ W, {a x , . . . , a r , a } is linearly 
independent. Extend this set to a basis {a 1? . . . , a n } where a r+1 = a . Let 
{<f> lt . . . , <£ n } be the dual basis. Then <f> r+1 has the desired property. 

13. Let A = (aj, . . . , oc n } be a basis of V such that {a l5 . . . , a r } is a basis of W. 

Let A ={#!,... , <£ w } be the basis dual to A. Let y = 2X=i v( a j)^j- Then 
for each 0^ e W, y> (a t ) = y(aj). Thus y and y> coincide on all of VV. 

14. The argument given for Exercise 12 works for W = {0}. Since a ^ VV, there 
is a <f> such that <£(a) 5* 0. 

15. Let W = <j3>. If a £ W, by Exercise 12 there is a ^ such that <£(°0 = 1 and 
*(/*) = 0. 

IV-2 

1. This is the dual of Exercise 5 of Section 1. 

2. Dual of Exercise 6 of Section 1. 3. Dual of Exercise 12 of Section 1. 
4. Dual of Exercise 14 of Section 1. 5. Dual of Exercise 15 of Section 1. 



Answers to Selected Exercises 



333 



IV-3 

1. P = (P- 1 )^ = (PT)~\ 

1 1 o - 



2. P = 



1 







1 



1 -1 



(p-l)T = 



-1 



1 -1 
1 -1 -1 



Thus,!' = {[-1 1 1], [2 -1 -1], [1 -1]}. 

3. {[1 -1 0], [0 1 -1], [0 1]}. 

4- {.L^j J ~2li \-~l 2 — ' 2 J' L2 2 2]/" 

5. BX = B(PX') = (BP)X' = B'X'. 



IV-4 

1. («){[1 1 1]} (b){[-\ -1 1 1]}. 

2. [1 -1 1]. 

3. Let W be the space spanned by {a}. Since dim W = 1, dim VV-L = dim V - 
1 <dimV. Thus there is a <£<£W-L. 

4. If <f> e T-L, then <£a = for all aeT. But since S <= T, <£a = for all a e S 
also. Thus T-L «= SJ-. 

5. Since S-L-L is a subspace containing S, <S> <= S-L-L. By Exercise 4, S <= <S>, 
S-L => <S>^, S-LJ- c (S)ii = <s>. 

6. Since S c S + T, S-L => (S + T)-L. Similarly, T-L => (S + T)-L. Thus (S + 
T)± cjin T-L. Since S nT c S, (S n T)i => S-L. Similarly, (S n T)-L => 
T-L. Since (S n T)J- is a subspace, (S n T)J- = S-L + T-L. 

7. If S and T are subspaces, S-LJ- = S and T^-L = T. Thus SnT = JH n 
T-L-L => (S^ + T-L) J- and hence (S n T)± <= (S-L + TJ-)-L-L = (S-L + T±). 
Similarly, S + T = S^i- +THc(Si n T-L)^ and hence (S + T)-L => 
(S-L n T±)-L± = S-L n T-L. 

8. S-L + T-L = (S n T)-L = {0}^ = K. 

9. S-L n T-L = (S + T)-L = VJ- = {0}. 

10. By Exercises 9 and 10, S-L + T-L = V, and the sum is direct. For each y>eV 
define y»i G S by the rule: v>i a = V a for all a e S. The mapping of y> G V onto 
y x G S is linear and the kernel is S-L. By Exercise 13 of Section 1 every functional 
on S can be obtained in this way. Since V = S-L © T-L, S is isomorphic to T-L. 

11. f(t) = 4>(ta. + (1 -/)/5) is a continuous function off. Since £(/ a + (1_- 00) = 
/0(a) + (1 - t)4>(P), fit) > if a, j8 e S+ and < t < 1. Thus a/3 c S+. If 
a g S+ and /J G S~, then/(0) < and/(l) > 0. Since/is continuous, there is a 
t, < t < 1, such that/(0 = 0. 



IV-5 

A. 

1. Let t be a mapping of U into V and er a mapping of V into W. Then, if <f> G W, 
we have for all f eU, (^))(l) = ^(1)] = M^(l)] = a(*)[r(f)] = 

2. {[-6 2 1]}. 

3. [1 -2 1]. 



334 



Answers to Selected Exercises 



IV-8 



1. 



1 
-1 



"1 3 51 




3 5 7 


+ 


1-5 7 9_ 





and d&tA T = det {-A) = (■ 



1 

2 
l) n det A 



-1 -2 
-1 



1 0_ 

Thus det A = 



5. det A? = det A 

-det A. 
7. <r,(a) = if and only if o f (*)(p) = /(<x, 0) = for all £ V. 
9. Let dim U = m; dim V = «. /(a, #) = for all £ V means a e [^(V)]^ or, 

equivalently, a e ct^KO). Thus p(r f ) = dim -^(V) = m — dim [^(V)]- 1 - = 

m — dim ff-KO) = m — v(o f ) = p(a f ). 

10. If m 9* n, then either p(oy) < m or ,0(7^) < «. 

11. L/ is the kernel of a f and V is the kernel of t / . 

12. m — dim U = m — v(a f ) = p(o f ) = p(r f ) = n — v(r f ) = n — dim V . 

13. =/(a + p, a + 0) = /(a, a) +/(a, 0) + /(/?, a) + /(£, /?) =/(«, /?) + 
/(£, a). 

14. If /4fi = BA, then C45) T = (BA) T = A T B T = AB. 

15. B is skew-symmetric. 

16. (a) (A 2 )? = A T A T = (-AK-A) = A 2 ; 

(b) (AB - BA) T = (AB) T - (BA)T = B(-A) - (-A)B = AB - BA; 

(c) (AB) T = (BA) T = A T B T = (~A)B = -AB. If (AB)T = -AB, then 
AB = -(AB) T = -B T A T = -B(-A) = BA. 



IV-9 



1. (a) 



"2 f " 



(c) 



"1 1 


2] 


1 3 


i 

2 


2 A 

l_Z 2 


7J 



(«) 



"1 


2 


r 


2 


4 


2 


-1 


2 


1. 



2. (a) 2x^2 + 1^2/2 + f#i# 2 + 6^2/2 (if ( x i, Vi) and (x 2 , ?/ 2 ) are the coordinates 
of the two points), (c) x x x 2 + x x y 2 + x 2 y 1 + 2x x z 2 + lx 2 z x + 3y x y 2 + \y-& 2 + 
\y&\ + 7z 1 z 2 , (e) x x x 2 + 2x x y 2 + 2x 2 y x + 4y x y 2 + x x z 2 + x 2 z x + z x z 2 + 2y x z 2 + 
1y 2 z x . 

IV-10 

1. (In this and the following exercises the matrix of transition P, the order of 
the elements in the main diagonal of P T BP, and their values, which may be 
multiplied by perfect squares, are not unique. The following answers can only 

T -2 2" 



be thought of as representative possibilities.) (a) P = 



-2 
U 



The diagonal of P T BP is {1, -3, 9}; (b) 



ri 


2 





-1 


Lo 






2 

-7 
4J 



{1, -4,68}; 



(c) 



"0 


1 





-1 


1 








2 








1 


2 





1 





1 



,{1,4, -1, -4}. 



Answers to Selected Exercises 



335 



2. (a) P = 



1 -3" 
4 



, {2, 78} (c) 



1 


1 


-11" 





-1 


3 


.0 





4 



,0,2,30}; 



to 



1 


-2 


-r 





1 











i_ 



0,0,0}. 



IV-11 



1. (a)r=2,S = 2; (6)2,0; (c)3,3; (d)2,0; (e)l,l; (/) 2, 0; (,§03,1. 



2. IfP = 



1 -b\2 
L0 a J 



, then P T BP = 



a 

_0 (a/4)(-6 2 +4ac)J 
3. There is a non-singular g such that Q T AQ = I. Take P = Q~ . 

4 Let P = /4 — 1 

5* There is a non-singular Q such that Q T AQ = B has r l's along the main 

6. £f°£i ? = to,? T.' *>, r*r = 2«^ * "• ™» ™* = 

(AX) T (AX) = Y T Y > for all real Z = (x lt . . . , x n ). 
7 If y = (2/1, • • • ,y n ) * 0,then 7^7 > 0. HA * 0, there is an X = (x lt . . . ,x n ) 
such that AX = r ?* 0(why?). Then we would have = X T A T AX = Y T Y > 

9. If ^JT * for any i, then = X^U ^ x = 25-i X**?** > °- Thus 
,4 ,jjf = for all X and ,4, = 0. 



rv-12 



1. (a) P - 



.0 U 



.diagonal ={1,0} (b) 



1 -1 + f 



L0 



1 J 



,{!,-!}• 



3-9. Proofs are similar to those for Exercises 3-9 of Section 11. 
10. Similar to Exercise 14 of Section 8. 



V-l 



1. 6. 2. 2i. 

6. (a - 0, a + P) = (a, a) - (/S, a) + (a, /?) - (0, 0) = l|a|| 2 - 

7. || a + /5 ||2 = || a ||2 +2(a,/3) + ||/S|| 2 . 

11. (4- (0, 1, >. 0), HO, 2, -2, -1), K-3, -2, 2, -8)j. 



336 Answers to Selected Exercises 

12. x* -\,x* - 3a>/5. 

13. (a) If 2r=i a & = °> then J^ «,,(*„ *,) = (f„ ^ <,,*,) = (^, ) = for 
each /. Thus 25-i#« fl * = and the columns of G are dependent. (b) If 
7L%xg<&i = for each /, then = J™ t «,(£, £) = (f„ £« x a^) for each i. 
Hence, 2f=i^(^-,2r=i«^) = (If=i^ 2r=i«^) =0. Thus ^^ = 
0. (c) Let A = { ai , . . . , a n } be orthonormal, f, = 2"=i«*j a r Then ^„ = 
(ft. f>) = ^k=i^ki a ki- Thus G = ^*/i where ^ = [a i? -]. 



V-2 

1. If a = 2?=l a &> then (£<, a) = J?-! «*(**, *i) = «i- 

X is linearly independen 
Since (£,, /S) = 0, /? = 



2. X is linearly independent. Let a £ V and consider = a — V" ^ " „ *' . 
Since ($.. B) = 0.5 = 0. Z * =1 ll*<ll* 



V-4 

1. ((^)*(a), /?) = (a, ardS)) = (**(a), r(/0) = (r*ff*(a), /?). 

2. (<r(a), cr(a)) = (a, <7*cr(«)) = for all a. 

3. (ff*(a), /?) = (a, a(0)) =/(a, 0) = -/(/?, a) = -(/?, a(a)) = _(*(«), 0) = 

5. Let f be an eigenvector corresponding to A. Then A(|, £) = (£, <x(f)) = 
0*(£>, f) = (-Af, I) = -*(f, 5). Thus (A + A) = 0. 

6. cr is skew-symmetric. 7. c is skew-symmetric. 
8. Let $ e W^. Then for all r\ e W, (<r*(f), *?) = (£, <t(jj)) = 0. 

10. Since (tt*) 2 = (tt 2 )* = n*, n* is a projection. £ e ^(tt*) if and only if 
("*(£), *?) = (f , "(»?)) = for all *?; that is, if and only if I e S-L. Finally, 
(tt*(£), »?) = (I, *■(*?)) vanishes for all £ if and only if Tr(rj) = 0; that is, if and 
only if r\ e T. Then tt*(V) £ T-L. Since tt*(V) and T-L have the same dimension, 

11. (£, or(»?)) = (<r*(f), 17) = for all tj if and only if o*($) = 0, or £ £ W^. 

13. By Theorem 4.3, V=W© W^. By Exercise 11, a*(V) = a*(W). 

14. <7*(V) = ff*(W) = a*a(V). a(V) = aa*(V) is the dual statement. 

15. a*(V) = a*a(V) = aa*(V) = a(V). 

16. By Exercises 15 and 11, W- 1 - is the kernel of a* and a. 

21. By Exercise 15, a(V) = a*(V). Then <r 2 (V0 = aa*{V) = a(V) by Exercise 14. 



V-5 

1. Let I be the corresponding eigenvector. Then (£, £) = (<r(l), CT (f)) = 0*f» ^£) 
AA(£, I). 



3. It also maps £ 2 onto ± 



*i"*. 



V2 
4. For example, f 2 ont ° K 2 ^i ~~ 2 ^2 + h) anc * £3 ont o i(2fi + £ 2 — 2 ^)- 



Answers to Selected Exercises 337 



V-6 

1. (a) and (c) are orthogonal. 2. (a). 

5. (a) Reflection in a plane (x x , x 2 -plane). (b) 180° rotation about an axis (x 3 -axis). 
(c) Inversion with respect to the origin, (d) Rotation through about an axis 
(x 3 -axis). (e) Rotation through about an axis (x 3 -axis) and reflection in the 
perpendicular plane (x lf x 2 -plane). The characteristic equation of a third-order 
orthogonal matrix either has three real roots (the identity and (a), (b), and (c) 
represent all possibilities) or two complex roots and one real root ((d) and (e) 
represent these possibilities). 



V-7 

1. Change basis in V as in obtaining the Hermite normal form. Apply the Gram- 
Schmidt process to this basis. 

2. If a( Vj ) = 2Li a i7Vi, then a*(r, k ) = J^ =1 (Vj, <**))>?; = SUMty), Vk)% = 

2?=i (2i=v "aVi, nic)ni = 2U*«^- 

3. Choose an orthogonal basis such that the matrix representing a* is in super- 
diagonal form. 



V-8 

1. (a) normal; (b) normal; (c) normal; (d) symmetric, orthogonal; (V) orthog- 
onal, skew-symmetric; (/) Hermitian; (g) orthogonal; (h) symmetric, 
orthogonal; (/) skew-symmetric; normal; (j) non-normal ; (&) skew-symmetric 
normal. 

2. All but (c) and (/). 3. AT A = (-A)A = -A 2 = AA?. 
"0 -1 -r 



5. 



V-9 



1 -1 
/ 1 



6. Exercise 1(c). 



4. (<r*(°0, /S) = (a, o(fi)) =/(a, /?) =/(/?, /?) = (£, a(a)) = (<7(a), 0). 

5. /(a, 0) = (a, a(fi)) = (£? =1 a^, JjU V(£)) = Q> =1 a, £„ £? =1 W>) = 

iLi MA- 

6. ?(a) =/(a, a) = £? =1 kl 2 A,, Since £ i=1 k-| 2 = 1, min {AJ < a(a) < 

max {A,} for a e S, and both equalities occur. If a ^ 0, there is a real positive 

scalar a such that ao. e S. Then o(<x) = -^ ^(aa) > min {Aj > 0, if all eigen- 
values are > 0. a 

V-10 

1. (a) unitary, diagonal is {1, /}. (b) Hermitian, {2, 0}. (c) orthogonal, {cos + 
/sin0, cos — /sin 0}, where = arccos 0.6. (d) Hermitian, {1,4}. 
(e) Hermitian, {1,1+ yfl, 1 - yfl). 



338 Answers to Selected Exercises 



V-ll 

1. Diagonal is {15, -5}. (d) {9, -9, -9}. (e) {18, 9, 9}. (/) {-9, 3, 6}. 
(^{-9,0,0}. (/*){1,2,0}. (i){l, -1, -ft}. (/){3,3, -3}. (*){-3,6,6}. 

2. (</), (A). 

3. Since P T BP = B' is symmetric, there is an orthogonal matrix R such that 
ijrg'/? = B" is diagonal matrix. Let Q = PR. Then (PR) T A(PR) = 

RT P T APR = /JTR = l an d (PR)T B ( PR ) = RT P T BPR = /?T£'/? = 5" j s 

diagonal. 

4. ^ T /4 is symmetric. Thus there is an orthogonal matrix Q such that Q T (A T A)Q = 
D is diagonal. Let B = Q T AQ. 

5. Let P be given as in Exercise 3. Then det {P T BP - xl) = det (P T (B - xI)P) = 
detP 2 • det {B — xA). Since P T BP is symmetric, the solutions of det (B — xA) = 
are also real. 

[a bl 

7. Let A = . Then A is normal if and only if b 2 = c 2 and ab + cd = 

_c dj 

ac + bd. If b or c is zero, then both are zero and .4 is symmetric. If a =0, 
then b 2 = c 2 and cd = bd. If d ?* 0, c = d and /I is symmetric. If d = 0, 
either 6 = c and /I is symmetric, or b = —c and A is skew-symmetric. 

8. If b = c, A is symmetric. If b = —c, then a = d. 

9. The first part is the same as Exercise 5 of Section 3. Since the eigenvalues of 
a linear transformation must be in the field of scalars, o has only real eigen- 
values. 

10. a 2 = — a* a is symmetric. Hence the solutions of \A 2 — xl\ = are all 
real. Let A be an eigenvalue of a 2 corresponding to f. Then (<x(f), 
<t(£)) = (£, <7*<t(!)) = (f, (7 2 (£)) = -A(£, I). Thus * ^ 0. Let A = -^ 

a(*0 = 1 <r 2 (f) = I (-^f) = -/,!. cr 2 (r/) = -/uo(Z) = -fn- (£, >?) = 

/X [* 






rj). 



11. <x(£) = ^r?, ff(??) = -/x£. 

12. The eigenvalues of an isometry are of absolute value 1. If <?(£) = A| with A 
real, then a* (I) = AS, so that (<r + <r*)(£) = 2A£. 

13. If (a + (T*)(£) = 2//£and<r(f) = A£,then A = ±1 and2^ = 2A = ±2. Since 
(f , ct(I)) = (a*(£), |) = (|, 0*(£))- 2MI, f) = (£, (* + **)(£)) = 2(f , *(£)). 
Thus \n\ • Hill 2 = |(f, <r(f))| < Hill • ||a(f)|| = Hill 2 , and hence |/*| < 1. If 
\ju\ = 1, equality holds in Schwarz's inequality and this can occur if and only 
if cx(£) is a multiple of £. Since £ is not an eigenvector, this is not possible. 

14. (£, rj) = * {(£, <x(|)) - Mf, I)} = 0. Since <x(£) + <r*(£) = 2/uf, 
<r 2 (£) + f = 2/xo(S). Thus 



, (T 2 (|) - ixa^) fio($) - | /, 2 £ + ^Vl - fx 2 v - f 
ct(?j) = — = — = , 

Vi - ^ Vi - /« 2 Vi - ^ 2 

= - y/l - n 2 + vy. 

1 5. Let f x , ^ x be associated with /x lt and | 2 , j? 2 be associated with /x 2 , where /x 2 ^ /x 2 . 
Then (f x , (a + <r*)(£ 2 )) = (f lf 2/i 2 S 2 ) = ((a + cr*^^), f 2 ) = (2^!^, I 2 ). 
Thus (f lf f a ) = 0. 



Answers to Selected Exercises 
V-12 

2. A + A* = 



2 
3 


4 
3 


0" 


4 
3 


2 
3 











2 
3_ 



339 



. The eigenvalues of A + A* are {— §, -f , 2}. 



Thus ii = —\ and 1. To n = 1 corresponds an eigenvector of /I which is 

—= (1, — 1, 0). An eigenvector of A + A* corresponding to -f is (0, 0, 1). 

V2 1 

If this represents f, the triple representing i\ is —= (1, 1,0). The matrix repre- 

V 2 

senting or with respect to the basis {(0, 0, 1), —= (1, 1, 0), —= (1, -1, 0)} is 

y 2 >/2 

2V2 



2V2 

. 





u 



VI-1 

1. (1)(1,0,1) + ^(1,1,1) +/ a (2, 1,0); [1 -2 l]^,^,^) =2. 

(2) (1, 2, 2) + t x {2, 1, -2) + r 2 (2, -2, 1); [1 2 2](x 1} a a , * 3 ) = 9. 

(3) (1,1, 1,2) +^(0,1,0, -l)+/ 2 (2, 1, -2,3); [1 1 OK^,^,^,^) = 
2, [-2 1 1](%, cc 2 , x 3 , z 4 ) = 1. 

2. (6, 2, -1) + t(-6, -1, 4); [1 -2 l](aj lf x t , x 3 ) = 2, [1 2 2](x x , x s , 
x ) = 9. 

3. L 3 = (2,'l, 2) + <(0, 1, -1), (-3, 0, f)>. 

4. ^(1, 1) + ii(-6, 7) + |f(5, -6) = (0, 0). 

6. Let L x and L 2 be linear manifolds. If L, n L 2 / 0, let <x e L x n L 2 . Then 
L x = <x + S x and L 2 = a + S 2 , where S x and S 2 are subspaces. Then L x n 
L 2 = a + (S x n S 2 ). 

7. Clearly, a x + S x <= 04 + (a 2 — a x > + S x + S 2 and a 2 + S 2 = 04 + (a 2 — a x ) + 
S 2 c aj + <a 2 — a x > + S x + S 2 . On the other hand, let 04 + S be the join 
of /-! and L 2 . Then L t = 04 + S x <= Kl + S implies S x <= s, and L 2 = <x 2 + 
^2 c a i + ^ implies a 2 — a x + S 2 <= S. Since S is a subspace, <<x 2 — a x > + S x + 
S 2 c= S. Since, a x + S is the smallest linear manifold containing L x and L 2 , 
04 + S = 04 + <a 2 — a x > + S x + S 2 . 

8. If a g L x n L 2 , then L x = a + S x and L 2 = a + S 2 . Thus ^Ji^ = a + 
<a — a > + S x + S 2 = a + S x + S 2 . Since 04 e ^Jl^, L X JL 2 = 04 + S x + 

s 2 . 

9. If a 2 — a x g Sj + S 2 , then a 2 — a x = ^ + fi 2 where ^ G S x and /? 2 G S 2 . Hence 
a 2 — £2 = a i + Pi- Since a x + fl x g a x + S x = L 2 and a 2 — /? 2 g a 2 + S 2 = 
L 2 , L x n L 2 ^ 0. 

10. If l x n L 2 # 0, then L X JL 2 = a x + S x + S 2 . Thus dim L^^ = dim (S x + S 2 ). 
If ^013= 0, then L^iL^ = a x + <a 2 — a x > + S x + S 2 and L a JL 2 ^ a x + 
S 2 + S 2 . Thus dim ^Jlg = dim (S x + S 2 ) + 1. 



340 Answers to Selected Exercises 



VI-2 

1. If Y = [y-L y 2 y 3 ], then Y must satisfy the conditions Y(\, 1, 0) > 
7(1,0, -1)>0, F(0, -1,1) >0 

2. {[1 -1 1], [1 -1 -1], [1 1 1]}. 

3. {(1, 1, 0), (1, 0, -1), (0, -1, 1), (0, 1, 1), (1, -1, 0), (1, 1, 1)}. 

4- {(1, 0, -1), (0, -1, 1), (0, 1, 1)}. Express the omitted generators in terms of 
the elements of this set. 

5. {[-1 -1 2], [1 1 -1], [1 -1 -1]}. 

6. {(1,0, 1), (3,1,2), (1, -1,0)}. 

7. Let Y = [-1 -1 2]. Since YA > and YB = -2 < 0, (1, 1, 0)<£ C 2 . 

8. Let Y = [-2 -2 1]. 9. Let Y = [1 -2]. 
10. This is the dual of Theorem 2.14. 11. Let Y = [2 2 11. 

12. Let A = {<f> x , ... , <f> n } be the dual basis to A. Let <f> = ]>> =1 <t>i- Then £ is 
semi-positive if and only if f > and <£ f > 0. In Theorem 2.11, take /S = 
and g = 1. Then v>£ = <g = 1 for all ?»6^ and the last condition in 
(2) of Theorem 2.11 need not be stated. Then the stated theorem follows 
immediately from Theorem 2.11. 

14. Using the notation of Exercise 13, either (1) there is a semi-positive £ such 
that <r(£) = 0, that is, £ e W, or (2) there is a y> E V such that 6{y>) > 0. Let 
<f> = d-(y>). For £eW, <t>£ = d(y>)! = vM£) = 0. Thus <f> e W±. 

15. Take £ = 0, g = 1, and <f> = £f =1 &, where {^ . . . , <£„} is the basis of P±. 



VI-3 

1. Given A, B, C, the primal problem is to find X > which maximizes CJSf 
subject to AX <: B. The dual problem is to find Y > which minimizes 
r£ subject to YA <, C. 

2. Given A, B, C, the primal problem is to find X > which maximizes CX 
subject to AX = B. The dual problem is to find Y which minimizes YB 
subject to Y/l > C. 

6. The pivot operation uses only the arithmetic operations permitted by the 
field axioms. Thus no tableau can contain any numbers not in any field 
containing the numbers in the original tableau. 

7. Examining Equation (3.7) we see that «£f will be smaller than <f>£ if c k — d k < 
0. This requires a change in the first selection rule. The second selection 
rule is imposed so that the new £' will be feasible, so this rule should not be 
changed. The remaining steps constitute the pivot operation and merely 
carry out the decisions made in the first and second steps. 

8. Start with the equations 

Ax x + x 2 + x 4 =10 

The first feasible solution is (0, 0, 6, 10, 3). The optimal solution is (2, 2, 0, 
0, 3). The numbers in the indicator row of the last tableau are (0, 0, — f , — |, 0). 

9. The last three elements of the indicator row of the previous exercise give 
Vi = f , 2/ 2 = h 2/3 = °- 



Answers to Selected Exercises 



341 



10. The problem is to minimize 6y x + I0y 2 + 3y 3 + My 4 + My 5 , where M is 
very large, subject to 

2/i + 2/2 + 2/3 + 2/4 - 2/e = 2 

22/1 + 4y 2 - y 3 +y 5 - 2/7 = 5. 

When the last tableau is obtained, the row of {</*} will be [6 10 2 2 
—2 —2]. The fourth and fifth elements correspond to the unit matrix in 
the original tableau and give the solution x x = 2, x 2 = 2 to Exercise 8. 

11. Maximum = 12 at x x = 2, x 2 = 5. 

12. (0,0), (0,2), (1,4), (2, 5). 

15. Xand Ymect the test for optimality given in Exercise 14, and both are optimal. 

16. AX = B has a non-negative solution if and only if min FZ = 0. 

. Vi 6 4 > "» 164) 164 v/» 164/. 



VI-6 

1. (a) A = (-1) 



L 2 



1 
2J 



+ 3 



(b) A = 2 

(c) A = 5 
(*M = 3 



5J 
2.-1 
5 



+ (-3) 



A =2 



+ (-7) 



2. ^ = 



"2 
7 
-3 



-11 

14 

-6 



U10 

r j_ 
5 

_2_ 
5 





1 J 

_2_ 
5 



+ (-8) 



1 
3 

L 10 



__3_~1 
10 



+ 2 



4. /4 = ^.E^ + AE 2 where E x 



18 
-6 
3 
3 3 

-2 -2 
-2 -3 
-3 



42" 

-14 

7 







"I -V- -311 


+ 3 










_0 


0_ 


0" 




■-2 -3 





, E 2 = 


2 3 


1_ 






2 3 



and ^Fi = 2E X + N x , where N x = 



6. e^ = e 2 



-3 
2 
-1 -1 



3- 

-2 





+ e 



2 

1 

-2 -3 
2 3 

2 3 



-6 

4 

2 

0" 






3" 
-2 
-1 



VI-8 



1. V = -\(x 2 



2/i) 



*i) 2 + 



y/3 
2(^2 — ^3) ^~ (2/2 — 



+ 



/3 
iC^s ~ x i) +^r (2/3 



342 



Answers to Selected Exercises 



2. £/. 

2 

4. These displacements represent translations of the molecule in the plane con- 
taining it. They do not distort the molecule, do not store potential energy, 
and do not lead to vibrations of the system. 



VI-9 

2. tt = (124), a = (234), an = (134), P = an' 1 = (12)(34). 

3. Since the subgroup is always one of its cosets, the alternating group has only 
two cosets in the full symmetric group, itself and the remaining elements. 
Since this is true for both right and left cosets, its right and left cosets are equal. 
(<?) = Z>((123)) = £((132)) = [1],£((12)), = £((13)) = £((23)) = [-1]. 

" 2 -r 



7. The matrix appearing in (9.7) is H = 4 



10. 



11. 



is then P = 



2V6 



"3 1" 

2 



-1 



The matrix of transition 



G is commutative if and only if every element is conjugate only to itself. By 
Theorem 9.11 and Equation 9.30, each n r = 1. 

Let £ = 6>2in7n be a primitive «th root of unity. If a is a generator of the cyclic 
group, let £*(a) = [£*], k = 0, ...,«- 1. 



c 4 



D 1 


1 


1 


1 


1 


D 1 


1 


1 


1 


1 


£ 2 


1 


i 


-1 


— / 


£ 2 


1 


1 


-1 


-1 


£ 3 


1 


-1 


1 


-1 


£ 3 


1 


-1 


1 


-1 


£ 4 


1 


— / 


-1 


i 


£ 4 


1 


-1 


-1 


1 



12. By Theorem 9.12 each n r \p 2 . But n r = p or p 2 is impossible because of (9.30) 
and the fact that there is at least one representation of dimension 1. Thus 
each n r = 1 , and the group is commutative. 

16. Since ab must be of order 1 or 2, we have (ab) 2 = e, or ab = b - ^ -1 . Since 
a and b are of order 1 or 2, a -1 = a and b _1 = b. 

17. If G is cyclic, let a be a generator of G, let £ = e™'*, and define £ fc (a) = [£*]. 
If G contains an element a of order 4 and no element of higher order, then G 
contains an element b which is not a power of a. b is of order 2 or 4. If b is of 
order 4, then b 2 is of order 2. If b 2 is a power of a, then b 2 = a 2 . Then c = ab 
is of order 2 and not a power of a. In any event there is an element c of order 2 
which is not a power of a. Then G is generated by a and c. If G contains 
elements of order 2 and no higher, let a, b, c be three distinct elements of order 2. 
They generate the group. Hints for obtaining the character tables for these last 
two groups are given in Exercises 21 , 25, and 26. 

18. The character tables for these two non-isomorphic groups are identical. 

29. I 2 + l 2 + 2 2 + 3 2 + 3 2 . 

30. H 4 contains C x (the conjugate class containing only the identity), C 3 (the 
class containing the eight 3-cycles), and C 5 (the class containing the three pairs 
of interchanges). 



Answers to Selected Exercises 



343 



VI-10 

2. The permutation (123) is represented by 

















_1 

2 


-V3/: 
















V3/2 -i 


-i - 


-n/3/2 
















V3/2 


-* 






















-* 


V3/2 











sentation 



of (12) 


V3/2 
is 


-* 












~ 


-1 








0~ 












1 












-1 





















1 





















- 


-1 





















1 





3. Ci — 1,^2 — 1*^3 — 2. 

4. ^ = (-V3/2, -i,V3/2, 4,0,1). 




The displacement is a uniform expansion of the molecule. 
5- £ 2 = (-|,f, -h -f,l,0). 




This displacement is a rotation of the molecule without storing potential energy 

6. {£, = (1, 0, 1, 0, 1, 0), | 4 = (0, 1, 0, 1, 0, 1)}. This subspace consists of 
translations without distortion in the plane containing the molecule. 

7. This subspace is spanned by the vectors £ 5 and f 6 given in Exercise 8. 





Notation 



S„:/*eM 

{a|P} 

<A> 

+ (for sets) 



Im(<r) 
Hom(a, I/) 

K(<x) 

v(<t) 

U/K 

Sgn 7T 

det^ 

l fl y| 
adj A 

C(k) 

S{X) 

TiU) 

V (space) 

A (basis) 

W 1 

R 



6 


A x B 


147 


6 


)Q = i A 


147 


6 


y a 


147 


6 


©7= i^i 


148 


12 




150 


21 


fs 


159 


23 


Jss 


159 


27 


A 


171 


28 


A* 


171 


30 


(a,0 


177 


31 


II «i 


111 


31 


d{<x, P) 


111 


31 


rj 


186 


55 


a* 


189 


80 


W t 1 W 2 


191 


84 


A(S) 


225 


87 


H(S) 


228 


89 


LiJL 2 


229 


89 


W + 


230 


95 


P (positive orthant) 


234 


106 


> (for vectors) 


234 


107 


> (for vectors) 


238 


115 


/'(& n) 


262 


129 


df^) 


263 


130 


e A 


275 


139,191 


X (for n-tuples) 


278 


140 


D(G) (representation) 


294 


142 


1 


298 



345 



Index 



Abelian group, 8, 293 

Addition, of linear transformations, 29 

of matrices, 39 

of vectors, 7 
Adjoint, of a linear transformation, 189 

of a system of differential equations, 283 
Adjunct, 95 
Affine, closure, 225 

combination, 224 

n-space, 9 
Affinely dependent, 224 
Algebraically closed, 106 
Algebraic multiplicity, 107 
Alternating group, 308 
Annihilator, 139, 191 
Associate, 76 
Associated, homogeneous problem, 64 

linear transformation, 192 
Associative algebra, 30 
Augmented matrix, 64 
Automorphism, 46, 293 

inner, 293 

Basic feasible vector, 243 
Basis, 15 

dual, 130 

standard, 69 
Bessel's inequality, 183 
Betweenness, 227 
Bilinear form, 156 
Bounded linear transformation, 260 

Cancellation, 34 

Canonical, dual linear programming 
problem, 243 

linear programming problem, 242 

mapping, 79 
Change of basis, 50 
Character, of a group, 298 

table, 306 

347 



Characteristic, equation, 100 

matrix, 99 

polynomial, of a matrix, 100 

of a linear transformation, 107 

value, 106 
Characterizing equations of a subspace, 69 
Codimension, 139 
Codomain, 28 
Cofactor, 93 
Column rank, 41 
Commutative group, 8, 293 
Companion matrix, 103 
Complement, of a set, 5 

of a subspace, 23 
Complementary subspace, 23 
Complete inverse image, 27 
Completely reducible representation, 295 
Complete orthonormal set, 183 
Completing the square, 166 
Component of a vector, 17 
Cone, convex, 230 

dual, 230 

finite, 230 

polar, 230 

polyhedral, 231 

reflexive, 231 
Congruent matrices, 158 
Conjugate, bilinear form, 171 

class, 294 

elements in a group, 294 

linear, 171 

space, 129 
Continuously differentiable, 262, 265 
Continuous vector function, 260 
Contravariant vector, 137, 187 
Convex, cone, 230 

hull, 228 

linear combination, 227 

set, 227 
Coordinate, function, 129 

space, 9 



348 



Index 



Coordinates of a vector, 17 
Coset, 79 

Covariant vector, 137, 187 
Cramer's rule, 97 

Degenerate linear programming problem, 

246 
Derivative, of a matrix, 280 

of a vector function, 266 
Determinant, 89 

Vandermonde, 93 
Diagonal, main, 38 

matrix, 38, 113 
Differentiable, 261, 262, 265 
Differential of a vector function, 263 
Dimension, of a representation, 294 

of a vector space, 1 5 
Direct product, 150 
Direct sum, external, 148, 150 

internal, 148 

of representations, 296 

of subspaces, 23, 24 
Directional derivative, 264 
Direct summand, 24 
Discriminant of a quadratic form, 199 
Distance, 177 
Divergence, 267 
Domain, 28 
Dual, bases, 142 

basis, 134 

canonical linear programming problem, 
243 

cone, 230 

space, 129 

spaces, 134 

standard linear programming problem, 
240 
Duality, 133 

Eigenspace, 107 
Eigenvalue, 104, 192 

problem, 104 
Eigenvector, 104, 192 
Elementary, column operations, 57 

matrices, 58 

operations, 57 
Elements of a matrix, 38 
Empty set, 5 
Endomorphism, 45 
Epimorphism, 28 



Equation, characteristic, 100 

minimum, 100 
Equations, linear, 63 

linear differential, 278 

standard system, 70 
Equivalence, class, 75 

relation, 74 
Equivalent representations, 296 
Euclidean space, 179 
Even permutation, 87 
Exact sequence, 147 
Extreme vector, 252 

Factor, group, 293 

of a mapping, 81 

space, 80 
Faithful representation, 294 
Feasible, linear programming problem, 
241, 243 

subset of a basis, 243 

vector, 241, 243 
Field, 5 
Finite, cone, 230 

dimensional space, 15 

sampling theorem, 212 
Flat, 220 
Form, bilinear, 156 

conjugate bilinear, 171 

Hermitian, 171 

linear, 129 

quadratic, 160 
Four-group, 309 
Fourier coefficients, 182 
Functional, linear, 129 
Fundamental solution, 280 

General solution, 64 

Generators of a cone, 230 

Geometric multiplicity, 107 

Gradient, 136 

Gramian, 182 

Gram-Schmidt orthonormalization 

process, 179 
Group, 8, 292 

abelian, 8, 293 

alternating, 308 

commutative, 8, 293 

factor, 293 

order of, 293 

symmetric, 308 



Index 



349 



Half-line, 230 

Hamilton-Cay ley theorem, 100 
Hermite normal form, 55 
Hermitian, congruent, 172 

form, 171 

matrix, 171 

quadratic form, 171 

symmetric, 171 
Homogeneous, associated problem, 64 
Homomorphism, 27, 293 
Hyperplane, 141, 220 

Idempotent, 270 
Identity, matrix, 46 

permutation, 87 

representation, 308 

transformation, 29 
Image, 27, 28 

inverse, 27 
Independence, linearly, 1 1 
Index set, 5 
Indicators, 249 
Induced operation, 79 
Injection, 146, 148 
Inner, automorphism, 293 

product, 177 
Invariant, subgroup, 293 

subspace, 104 

under a group, 294 
Inverse, image, 27 

matrix, 46 

transformation, 43 
Inversion, of a permutation, 87 

with respect to the origin, 37 
Invertible, matrix, 46 

transformation, 46 
Irreducible representation, 271, 295 
Isometry, 194 
Isomorphic, 18 
Isomorphism, 28, 293 

Jacobian matrix, 266 

Join, 229 

Jordan normal form, 118 

Kernel, 31 
Kronecker delta, 15 
Kronecker product, 310 

Lagrange interpolation formula, 132 



Lagrangian, 287 
Length of a vector, 177 
Line, 220 

segment, 227 
Linear, 1 

algebra, 30 

combination, 11 

non-negative, 230 

conditions, 221 

constraints, 239 

dependence, 11 

form, 129 

functional, 129 

independence, 11 

manifold, 220 

problem, 63 

relation, 11 

transformation, 27 
Linearly, dependent, 11 

independent, 11 
Linear programming problem, 239 
Linear transformation, 27 

addition of, 29 

matrix representing, 38 

multiplication of, 30 

normal, 203 

scalar multiple of, 30 

symmetric, 192 

Main diagonal, 38 
Manifold, linear, 220 
Mapping, canonical, 29 

into, 27 

natural, 29 

onto, 28 
Matrix polynomial, 99 
Matrix, 37 

addition, 39 

characteristic, 99 

companion, 103 

congruent, 158 

diagonal, 38 

Hermitian, 171 
congruent, 172 

identity, 46 

normal, 201 

of transition, 50 

product, 40 

representing, 38 

scalar, 46 



350 



Index 



Matrix (continued) 

sum, 39 

symmetric, 158 

unit, 46 

unitary, 194 
Maximal independent set, 14 
Mechanical quadrature, 256 
Minimum, equation, 100 

polynomial, 100 
Monomorphism, 27 
Multiplicity, algebraic, 107 

geometric, 107 

n-dimensional coordinate space, 9 

Nilpotent, 274 

Non-negative, linear combination, 230 

semi-definite, Hermitian form, 168 
quadratic form, 173 
Non-singular, linear transformation, 

matrix, 46 
Non- trivial linear relation, 11 
Norm of a vector, 177 
Normal, coordinates, 287 

form, 76 

Hermite form, 55 

Jordan form, 118 

linear transformation, 203 

matrix, 201 

over the real field, 176 

subgroup, 293 
Normalized vector, 178 
Normalizer, 294 
Nullity, of a linear transformation, 31 

of a matrix, 41 

Objective function, 239 
Odd permutation, 87 
One-to-one mapping, 27 
Onto mapping, 28 
Optimal vector, 241 
Order, of a determinant, 89 

of a group, 293 

of a matrix, 37 
Orthant, positive, 234 
Orthogonal, linear transformation, 270 

matrix, 196 

similar, 197 

transformation, 194 

vectors, 138, 178 
Orthonormal, basis, 178 



Parallel, 221 

Parametric representation, 221 
Parity of a permutation, 88 
Parseval's identities, 183 
Particular solution, 63 
Partitioned matrix, 250 
Permutation, 86 

even, 87 

identity, 87 

group, 308 

odd, 87 
Phase space, 285 
Pivot, element, 249 

operation, 249 
Plane, 220 
Point, 220 
Pointed cone, 230 
Polar, 162 

cone, 230 

form, 161 
Pole, 162 

Polyhedral cone, 231 
Polynomial, characteristic, 100 

matrix, 99 

minimum, 100 
Positive, orthant, 234 

vector, 238 
Positive-definit, Hermitian form, 173 

quadratic form, 168 
Primal linear programming problem, 240 
Principal axes, 287 
Problem, associated homogeneous, 64 

eigenvalue, 104 

linear, 63 
Product set, 147 
Projection, 35, 44, 149 
Proper subspace, 20 

Quadratic form, 160 

Hermitian, 171 
Quotient space, 80 

Rank, column, 41 

of a bilinear form, 164 

of a Hermitian form, 173 

of a linear transformation, 31 

of a matrix, 41 

row, 41 
Real coordinate space, 9 
Reciprocal basis, 188 



Index 



351 



Reducible representation, 295 
Reflection, 43 
Reflexive, cone, 231 

law, 74 

space, 133 
Regular representation, 301 
Relation, of equivalence, 74 

linear, 11 
Representation, identity, 308 

irreducible, 271, 295 

of a bilinear form, 157 

of a change of basis, 50 

of a group, 294 

of a Hermitian form, 171 

of a linear functional, 130 

of a linear transformation, 38 

of a quadratic form, 161 

of a vector, 18 

parametric, 221 

reducible, 295 
Representative of a class, 75 
Resolution of the identity, 271 
Restriction, mapping, 84 

of a mapping, 84 
Rotation, 44 
Row-echelon, form, 55 

Sampling, function, 254 

theorem, 253 
Scalar, 7 

matrix, 46 

multiplication, of linear transformations, 
30 

of matrices, 39 
of vectors, 7 

product, 177 

transformation, 29 
Schur's lemma, 297 
Schwarz's inequality, 177 
Self-adjoint, linear transformation, 192 

system of differential equations, 283 
Semi-definite, Hermitian form, 173 

quadratic form, 168 
Semi-positive vector, 238 
Sgn, 87 
Shear, 44 
Signature, of a Hermitian form, 173 

of a quadratic form, 168 
Similar, linear transformations, 78 

matrices, 52, 76 



orthogonal, 197 

unitary, 197 
Simplex method, 248 
Singular, 46 
Skew-Hermitian, 193 
Skew-symmetric, bilinear form, 158 

linear transformation, 192 

matrix, 159 
Solution, fundamental, 280 

general, 64 

particular, 63 
Space, Euclidean, 179 

untary, 179 

vector, 7 
Span, 12 

Spectral decomposition, 271 
Spectrum, 270 
Standard, basis, 69 

dual linear programming problem, 240 

primal linear programming problem, 239 
Steinitz replacement theorem, 13 
Straight line, 220 
Subgroup, invarient, 293 
Subspace, 20 

invariant under a linear transformation, 
104 

invariant under a representation, 295 
Sum of sets, 39 
Superdiagonal form, 199 
Sylvester's law of nullity, 37 
Symmetric, bilinear form, 158 

group, 308 

Hermitian form, 192 

law, 74 

linear transformation, 192 

matrix, 158 

part of a bilinear form, 159 
Symmetrization of a linear transformation, 

295 
Symmetry, of a geometric figure, 307 

of a system, 312 

Tableau, 248 
Trace, 115,298 
Transformation, identity, 29 

inverse, 43 

linear, 27 

orthogonal, 194 

scalar, 29 

unit, 29 



352 



Index 



Transformation (continued) 

unitary, 194 
Transition matrix, 50 
Transitive law, 74 
Transpose of a matrix, 55 
Trivial linear relation, 11 



Unitary, matrix, 196 

similar, 197 

space, 179 

transformation, 194 
Unit matrix, 46 



Vandermonde determinant, 93 
Vector, 7 

feasible, 241, 243 

normalized, 178 

optimal, 241 

positive, 238 

semi-positive, 238 

space, 7 
Vierergruppe {see Four-group), 309 

Weierstrass approximation theorem, 185 

Zero mapping, 28 



X 



' 



NERING 



LINEAR ALGEBRA AND 
MATRIX THEORY 

By EVAR D, NERING, Professor of Mathematics, 
Arizona State University 

"The author presents Ideas in linear algebra very effectively with the help 
of matrices. ...The introductions are... excellent and help clarify the 
materia] substantially. The discussions in the introduction as welJ as in 

the body of each chapter are also very illuminating in details This 

book is highly recommended as a textbook. 11 — Physics Today on the first 
edition of Nering. 

The major change in the second edition is in the addition of new material. 
More sophisticated mathematical material is included and may be used 
independently by the reader with advanced knowledge of linear algebra. 
A new section on applications is added which provides an introduction to 
the modern treatment of calculus of several variables. The concept of 
duality receives expanded treatment in this edition. Finally, the appendix 
supplies a "do-it-yourself" kit which allows the reader to make up any 
number of exercises from those in the book. 

The presentation of material is handled with greater clarity and precision. 
The author has made changes in terminology regarding characteristic 
equations and normal linear transformations in order to clarify the es- 
sential mathematical ideas. Notations have been changed to correspond 
to more current usage. 

384 pages 



5 

soft 






CO 

m 

o 
o 

z 
D 
m 

g 
o 



JOHN WILEY & SONS, Inc. 

605 Third Avenue, New York : N.Y. 10016 

New York • London • Sydney ■ Toronto 



512, 

9*3 






\ 



471-63178-7 



NER 




WILEY